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SOLUBLE PEPTIDES HAVING 
CONSTRAINED, SECONDARY CONFORMATION IN 
SOLUTION AND METHOD OF MAKING SAME 



BACKGROUND OF THE INVENTION 



5 The biological function of a peptide depends upon 

its direct, physical interaction with another molecule. 
The peptide or protein is termed the ligand. 

Peptides are distinguishable by their specificity 
for certain ligand-binding proteins. The specificity of 

10 binding, i.e., the discrimination between closely related 
ligands, is determined by a peptide's binding affinity. 
Peptides having useful binding properties are invaluable 
for chemotherapy and drug design. Therefore, a need exists 
for the generation of peptides having biologically useful 

15 binding affinities and being soluble in solution. 

Secondary structure of a peptide is critical for 
determining its binding af f inity r~ For example,- a highly „ 
flexible peptide is ahle to interact with many distinct 
molecules; however, the peptide-ligand interaction is 
20 easily disrupted, or in other words, the binding affinity 
of the peptide is low. Thus, a peptide having a specific 
secondary structure is able to bind tightly with only a few 
or one ligand . 

However, if secondary structure of the ligand 
25 results from non-covalent interactions, the peptide 
inevitably is insoluble. Intra-peptide covalent bonds can 
solve this problem resulting in constrained peptides, i.e., 
peptides having a stable secondary structure in a solution, 
that are soluble. 

30 This invention provides a method to synthesize 

solubl peptides having constrained, secondary conformation 
in solution, as well as the p ptides produced by this 
method . 
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This invention also relates g nerally to methods 
for synthesizing and expressing oligonucleotides and, more 
particularly, to methods for expressing oligonucleotides 
having biased, but random codon sequences. 

5 Oligonucleotide synthesis proceeds via linear 

coupling of individual monomers in a stepwise reaction. 
The reactions are generally performed on a solid phase 
support by first coupling the 3' end of the first monomer 
to the support. The second monomer is added to the 5' end 

10 of the first monomer in a condensation reaction to yield a 
dinucleotide coupled to the solid support. At the end of 
each coupling reaction, the by-products and unreacted, free 
monomers are washed away so that the starting material for 
the next round of synthesis is the pure oligonucleotide 

15 attached to the support. In this reaction scheme, the 
stepwise addition of individual monomers to a single, 
growing end of a oligonucleotide ensures accurate synthesis 
of the desired sequence. Moreover, unwanted side reactions 
are eliminated, such as the condensation of two 

20 oligonucleotides, resulting in high product yields. 

In some instances, it is desired that synthetic 
oligonucleotides have random nucleotide sequences. This 
result can be accomplished by adding equal proportions of 
all four nucleotides in the monomer coupling reactions, 

25 leading to the random incorporation of all nucleotides and 
yielding a population of oligonucleotides with random 
sequences. Since all possible combinations of nucleotide 
sequences are represented within the population, all 
possible codbn triplets will also be represented. If the 

30 objective is ultimately to generate random peptide 
products, this approach has a severe limitation because the 
random codons synthesized will bias the amino acids 
incorporated during translation of the DNA by the cell into 
polypeptides. 
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The bias is due to th redundancy of the genetic 
code. There are four nucleotide monomers which leads to 
sixty-four possible triplet codons. With only twenty amino 
acids to specify, many of the amino acids are encoded by 
5 multiple codons. Therefore, a population of 

oligonucleotides synthesized by sequential addition of 
monomers from a random population will not encode peptides 
whose amino acid sequence represents all possible 
combinations of the twenty different amino acids in equal 
10 proportions . That is , the frequency of amino acids 
incorporated into polypeptides will be biased toward those 
amino acids which are specified by multiple codons. 



To alleviate amino acid bias due to the 
redundancy of the genetic code, the oligonucleotides can be 

15 synthesized from nucleotide triplets. Here, a triplet 
coding for each of the twenty amino acids is synthesized 
from individual monomers. Once synthesized, the triplets 
are used in the coupling reactions instead of individual 
monomers. By mixing equal proportions of the triplets 

20 synthesis of oligonucleotides with random codons can be 
accomplished. However, this is not possible because of the 
inefficiency of the coupling, which is less than 3% and the 
high cost of synthesis. 

Amino acid bias can be reduced, however, by 
25 synthesizing the degenerate codon sequence NNK where N is 
a mixture of all four nucleotides and K is a mixture 
guanine and thymine nucleotides. Each position within an 
oligonucleotide having this codon sequence will contain a 
total of 32 codons (12 encoding amino acids being 
30 represented once, 5 represented twice, 3 represented three 
times and one codon being a stop codon) . Oligonucleotides 
expressed with such degenerate codon sequences will produce 
peptide products whose sequences are biased toward those 
amino acids being represented more than once. Thus, 
35 populations of peptides whose sequences are completely 
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random cannot be obtained from oligonucleotides synthesized 
from d generate sequences • 

There thus exists a need for a method to express 
oligonucleotides having a fully random or desirably biased 
5 sequence which alleviates genetic redundancy. The present 
invention satisfies these needs and provides additional 
advantages as well* 

SUMMARY OF THE INVENTION 

This invention provides a peptide having 
10 constrained, secondary structure in solution as well as 
methods of synthesizing these peptides. 

The invention provides a plurality of procaryotic 
cells containing a diverse population of expressibl 
oligonucleotides encoding soluble peptides having 
15 constrained secondary structure or conformation in 
solution, the expressible oligonucleotide being 
operationally linked to expression elements, the 
expressible oligonucleotides further characterized as 
having a desirable bias of random codon sequences. 

20 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic drawing for synthesizing 
oligonucleotides from nucleotide monomers with random 
tuplets at each position using twenty reaction vessels. 

Figure 2 is a schematic drawing for synthesizing 
25 oligonucleotides from nucleotide monomers with random 
tuplets at each position using ten reaction vessels. 

Figure 3 is a schematic diagram of the two 
vectors used for sublibrary and library production from 
precursor oligonucleotide portions. M13IX22 (Figure 3A) is 
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the vector used to clone the anti-sense precursor portions 
(hatched box) . The single-headed arrow r presents the Lac 
p/o expression sequences and the double-headed arrow 
represents the portion of M13IX22 which is to be combined 
5 with M13IX42, The amber stop codon for biological 
selection and relevant restriction sites are also shown. 
M13IX42 (Figure 3B) is the vector used to clone the sense 
precursor portions (open box) . Thick lines represent the 
pseudo-wild type (^gVIII) and wild type (gVIII) gene VIII 

10 sequences. The double-headed arrow represents the portion 
of M13IX42 which is to be combined with M13IX22. The two 
amber stop codons and relevant restriction sites are also 
shown. Figure 3C shows the joining of vector population 
from sublibraries to form the functional surface expression 

15 vector M13IX. Figure 3D shows the generation of a surface 
expression library in a non-suppressor strain and . the 
production of phage. The phage are used to infect a 
suppressor strain (Figure 3E) for surface expression - and 
screening of the library. 

20 Figure 4 is a schematic diagram of the vector 

used for generation of surface expression libraries from 
random oligonucleotide populations (M13IX30). The symbols 
are as described for Figure 3. 

Figure 5 is the nucleotide sequence of Ml 3 1X4 2 
25 (SEQ ID NO: 1) . 

Figure 6 is the nucleotide sequence of M13IX22 
(SEQ ID NO: 2) . 

Figure 7 is the nucleotide sequence of Ml 3 1X30 
(SEQ ID NO: 3) . 

30 Figure 8 is the nucleotide sequence of M13ED03 

(SEQ ID NO: 4) . 
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Figure 9 is the nucleotide sequence of M13IX421 
{SEQ ID NO: 5) . 



Figure 10 is the nucleotide sequence of M13ED04 
(SEQ ID NO: 6) • 

DETAILED DESCRIPTION OF THE INVENTION 

This invention is directed to a simple and 
inexpensive method for synthesizing and expressing 
oligonucleotides having a desirable bias of random codons 
using individual monomers. The oligonucleotides produced 
10 by this method encode soluble peptides having constrained 
secondary structure in solution. The method is 

advantageous in that individual monomers are used instead 
of triplets and by synthesizing only a non-degenerate 
subset of all triplets, codon redundancy is alleviated* 
15 Thus, the oligonucleotides synthesized represent a large 
proportion of possible random triplet sequences which can 
be obtained. The oligonucleotides can be expressed, for 
example, on the surface of filamentous bacteriophage in a 
form which does not alter phage viability or impose 
20 biological selections against certain peptide sequences. 
The oligonucleotides produced are therefore useful for 
generating an unlimited number of pharmacological and 
research products. 

This invention entails the sequential coupling of 
25 monomers to produce oligonucleotides with a desirable bias 
of random codons. The coupling reactions for the 
randomization of twenty codons which specify the amino 
acids of the genetic code are performed in ten different 
reaction vessels. Each reaction vessel contains a support 
30 on which the monomers for two different codons are coupled 
in three sequential reactions. One of the reactions 
couples an qual mixture of two monomers such that the 
final product has two different codon sequences. The 
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codons are randomized by removing the supports from the 
reaction vessels and mixing them to produce a single batch 
of supports containing all twenty codons at a particular 
position. Synthesis at the next codon position proceeds by 
5 equally dividing the mixed batch of supports into ten 
reaction vessels as before and sequentially coupling the 
monomers for each pair of codons. The supports are again 
mixed to randomize the codons at the position just 
synthesized. The cycle of coupling, mixing and dividing 

10 continues until the desired number of codon positions have 
been randomized. After the last position has been 
randomized, the oligonucleotides with random codons are 
cleaved from the support. The random oligonucleotides can 
then be expressed, for example, on the surface of 

15 filamentous bacteriophage as gene Vlll-peptide fusion 
proteins. Alternative genes can be used as well. Using 
this method, one can randomize oligonucleotides at certain 
positions and select for specific oligonucleotides at 
others • 



20 This invention provides a diverse population of 

synthetic biased oligonucleotides contained in vectors so 
as to be expressible in cells. In the preferred embodiment 
of this invention, the oligonucleotides are fully defined 
in that at least two codons encode amino acids capable of 

25 forming a covalent bond. The populations of 

oligonucleotides can be expressed as fusion products in 
combination with surface proteins of filamentous 
bacteriophage, such as M13, as with gene VIII. The vectors 
can be transfected into a plurality of cells, such as the 

30 procaryote E. coli . 

In one embodiment, the diverse population of 
oligonucleotides can be formed by randomly combining first 
and second precursor populations, each or either precursor 
population having a desirable bias of random codon 
35 sequences. Methods of synthesizing and expressing the 
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diverse population of expressible oligonucleotides are also 
provided . 

Two precursor populations of random precursor 
oligonucleotides are synthesized in one embodiment. The 
oligonucleotides within each population encode a portion of 
the final oligonucleotide that is expressed. 
Oligonucleotides within one precursor population encode the 
carboxy terminal portion of the expressed oligonucleotides. 
In one embodiment, these oligonucleotides are cloned in 
frame with a gene VIII (gVIII) sequence so that translation 
of the sequence produces peptide fusion proteins. The 
second population of precursor oligonucleotides are cloned 
into a separate vector. Each precursor oligonucleotide 
within this population encodes the anti-sense of the amino 
terminal portion of the expressed oligonucleotides . This 
vector also contains the elements necessary for expression. 
The two vectors containing the random oligonucleotides are 
combined such that the two precursor oligonucleotide 
portions are joined together at random to form a population 
of larger oligonucleotides derived from two smaller 
portions. The vectors contain selectable markers to ensure 
maximum efficiency in joining together the two 
oligonucleotide populations. A mechanism also exists to 
control the expression of gVIII-peptide fusion proteins 
during library construction and screening. 

As used herein, the term "monomer" or "nucleotide 
monomer" refers to individual nucleotides used in the 
chemical synthesis of oligonucleotides. Monomers that can 
be used include both the ribo- and deoxyribo- forms of each 
30 of the five standard nucleotides (derived from the bases 
adenine (A or dA, respectively), guanine (G or dG) , 
cytosine (C or dC) , thymine (T) and uracil (U) ) . 
Derivatives and precursors of bases such as inosine which 
are capabl of supporting polypeptide biosynthesis are also 
35 included as monomers. Also included are chemically 
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modified nucleotides, for example, one having a reversible 
blocking agent attached to any of the positions on the 
purine or pyrimidine bases, the ribose or deoxyribose sugar 
or the phosphate or hydroxyl moieties of the monomer. Such 
5 blocking groups include, for example, dimethoxytrityl , 
benzoyl, isobutyryl, beta-cyanoethyl and diisopropylamine 
groups, and are used to protect hydroxyls, exocyclic amines 
and phosphate moieties. Other blocking agents can also be 
used and are known to one skilled in the art. 

10 As used herein, the term "tuplet" refers to a 

group of elements of a definable size. The elements of a 
tuplet as used herein are nucleotide monomers. For 
example, a tuplet can be a dinucleotide, a trinucleotide or 
can also be four or more nucleotides. 

15 As used herein, the teinn "codon" or "triplet" 

refers to a tuplet consisting of three adjacent nucleotide 
monomers— which specify one of the twenty naturally 
occurring amino acids found in polypeptide biosynthesis. 
The term also includes nonsense, or stop, codons which do 

20 not specify any amino acid. 

"Random codons" or "randomized codons," as used 
herein, refers to more than one codon at a position within 
a collection of oligonucleotides. The number of different 
codons can be from two to twenty at any particular 

25 position. "Randomized oligonucleotides," as used herein, 
refers to a collection of oligonucleotides with random 
codons at one or more positions. "Random codon sequences" 
as used herein means that more than one codon position 
within a randomized oligonucleotide contains random codons. 

30 For example, if randomized oligonucleotides are six 
nucleotides in length (i.e., two codons) and both the first 
and second codon positions are randomized to encode all 
twenty amino acids, then a population of oligonucleotides 
having random codon sequences with every possible 
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combination of the twenty triplets in the first and second 
position makes up the above population of randomized 
oligonucleotides. The number of possible codon 

combinations is 20^. Likewise, if randomized 

5 oligonucleotides of fifteen nucleotides in length are 
synthesized which have random codon sequences at all 
positions encoding all twenty amino acids, then all 
triplets coding for each of the twenty amino acids will be 
found in equal proportions at every position. The 
10 population constituting the randomized oligonucleotides 
will contain 20" different possible species of 
oligonucleotides. "Random tuplets," or "randomized 

tuplets" are defined analogously. 

As used herein, the term "bias" refers to a 
15 preference. It is understood that there can be degrees of 
preference or bias toward codon sequences which encode 
particular amino acids. For example, an oligonucleotid 
whose codon sequences do not preferably encode particular 
amino acids is unbiased and therefore completely random. 
20 The oligonucleotide codon sequences can also be biased 
toward predetermined codon sequences or codon frequencies 
and while still diverse and random, will exhibit codon 
sequences biased toward a defined, or preferred, sequence. 
"A desirable bias of random codon sequences" as used 
25 herein, refers to the predetermined degree of bias which 
can be selected from totally random to essentially, but not 
totally, defined (or preferred). There must be at least 
one codon position which is varicible, however. 

As used herein, the term "support" refers to a 
30 solid phase material for attaching monomers for cheiaical 
synthesis. Such support is usually composed of materials 
such as beads of control pore glass but can be other 
materials known to one skill d in the art. The term is 
also meant to include one or more monomers coupled to the 
35 support for additional oligonucleotide synthesis reactions. 
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As used herein / the terms "coupling" or 
"condensing" refers to the chemical reactions for attaching 
one monomer to a second monomer or to a solid support. 
Such reactions are known to one skilled in the art and are 
5 typically performed on an automated DNA synthesizer such as 
a MilliGen/Biosearch Cyclone Plus Synthesizer using 
procedures recommended by the manufacturer. "Sequentially 
coupling" as used herein, refers to the stepwise addition 
of monomers • 

10 The term "soluble peptide" means a peptide that 

is soluble at a concentration equivalent to its affinity to 
a receptor. The peptide can then be used in aqueous 
solution without being attached to a cell or phage. 

The term "constrained secondary structure in 
15 solution" means a peptide having a covalent bond that is 
not the backbone peptide bond. 

A method of synthesizing origonucleotides having 

biased random tuplets using individual monomers is 
described. The method consists of several steps, the first 

20 being synthesis of a nucleotide tuplet for each tuplet to 
be randomized. As described here and below, a nucleotide 
triplet (i.e., a codon) will be used as a specific example 
of a tuplet. Any size tuplet will work using the methods 
disclosed herein, and one skilled in the art would ]cnow how 

25 to use the methods to randomize tuplets of any size. 

If the randomization of codons specifying all 
twenty amino acids is desired at a position, then twenty 
different codons are synthesized. Likewise, if 

randomization of only ten codons at a particular position 
30 is desired then those ten codons are synthesized. 
Randomization of codons from two to sixty-four can be 
accomplished by synthesizing each desired triplet. 
Preferably, randomization of from two to twenty codons is 
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used for any one position because of the redundancy of the 
genetic code. The codons selected at one position do not 
have to be the same codons selected at the next position. 
Additionally, the sense or anti-sense sequence 
5 oligonucleotide can be synthesized. The process therefore 
provides for randomization of any desired codon position 
with any number of codons. In addition, it also allows one 
to preselect a specified codon to be present at a 
particular position within a randomized sequence. 

10 Codons to be randomized are synthesized 

sequentially by coupling the first monomer of each codon to 
separate supports. The supports for the synthesis of each 
codon can, for example, be contained in different reaction 
vessels such that one reaction vessel corresponds to the 

15 monomer coupling reactions for one codon. As will be used 
here and below, if twenty codons are to be randomized, then 
twenty reaction vessels can be used in independent coupling 
reactions for the first twenty monomers of each codon. 
Synthesis proceeds by sequentially coupling the second 

20 monomer of each codon to the first monomer to produce a 
dimer, followed by coupling the third monomer for each 
codon to each of the above-synthesized dimers to produce a 
trimer (Figure 1, step 1, where M^, Ma and Mj^ represent the 
first, second and third monomer, respectively, for each 

25 codon to be randomized) • 

Following synthesis of the first codons from 
individual monomers, the randomization is achieved by 
mixing the supports from all twenty reaction vessels which 
contain the individual codons to be randomized. The solid 

30 phase support can be removed from its vessel and mixed to 
achieve a random distribution of all codon species within 
the population (Figure 1, step 2). The mixed population of 
supports, constituting all codon species, are then 
redistributed into twenty independent reaction vessels 

35 (Figure 1, step 3). The resultant vessels are all 
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identical and contain equal portions or all twenty codons 
coupled to a solid phase support. 

For randomization of the second position codon, 
synthesis of twenty additional codons is performed in each 
5 of the twenty reaction vessels produced in step 3 as the 
condensing substrates of step 1 (Figure 1, step 4), Steps 

1 and 4 are therefore equivalent except that step 4 uses 
the supports produced by the previous synthesis cycle 
( steps 1 through 3 ) for codon synthesis whereas step 1 is 

10 the initial synthesis of the first codon in the 
oligonucleotide. The supports resulting from step 4 will 
each have two codons attached to them (i.e., a 
hexanucleotide) with the codon at the first position being 
any one of twenty possible codons (i*e., random) and the 

15 codon at the second position being one of the twenty 
possible codons • 

For randomization of the codon at the second 
position and synthesis of the third position codon, steps 

2 through 4 are again repeated. This process yields in 
20 each vessel a three codon oligonucleotide (i.e., 9 

nucleotides) with codon positions 1 and 2 randomized and 
position three containing one of the twenty possible 
codons. Steps 2 through 4 are repeated to randomize the 
third position codon and synthesize the codon at the next 

25 position. The process is continued until an 

oligonucleotide of the desired length is achieved* After 
the final randomization step, the oligonucleotide can be 
cleaved from the supports and isolated by methods known to 
one skilled in the art. Alternatively, the 

30 oligonucleotides can remain on the supports for use in 
methods employing probe hybridization. 

The diversity of codon sequences, i.e., the 
number of different possible oligonucleotides, that can be 
obtained using the methods of the present invention, is 
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extremely large and only limited by the physical 
characteristics of available materials* For example, a 
support composed of beads of about 100 in diameter will 
be limited to about 10,000 beads/reaction vessel using a 1 
5 reaction vessel containing 25 mg of beads. This size 

bead can support about 1 x 10"' oligonucleotides per bead. 
Synthesis using separate reaction vessels for each of the 
twenty amino acids will produce beads in which all the 
oligonucleotides attached to an individual bead are 

10 identical. The diversity which can be obtained under these 
conditions is approximately 10^ copies of 10,000 x 20 or 
2 00,000 different random oligonucleotides. The diversity 
can be increased, however, in several ways without 
departing from the basic methods disclosed herein. For 

15 example, the number of possible sequences can be increased 
by decreasing the size of the individual beads which make 
up the support. A bead of about 30 iim in dieuneter will 
increase the number of beads per reaction vessel and 
therefore the number of oligonucleotides synthesized. 

20 Another way to increase the diversity of oligonucleotides 
with random codons is to increase the volume of the 
reaction vessel. For ex6unple, using the same size bead, a 
larger volume can contain a greater number of beads than a 
smaller vessel and therefore support the synthesis of a 

25 greater number of oligonucleotides. Increasing the number 
of codons coupled to a support in a single reaction vessel 
also increases the diversity of the random 
oligonucleotides. The total diversity will be the number 
of codons coupled per vessel raised to the number of codon 

30 positions synthesized. For example, using ten reaction 
vessels, each synthesizing two codons to randomize a total 
of twenty codons, the niunber of different oligonucleotides 
of ten codons in length per 100 ;jm bead can be increased 
where each bead will contain about 2^° or 1 x 10^ different 

35 sequences instead of one. One skilled in the art will know 
how to modify such parameters to increase the diversity of 
oligonucleotides with random codons. 
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A method of synthesizing oligonucleotides having 
random codons at each position using individual monomers 
wherein the number of reaction vessels is less than the 
number of codons to be randomized is also described. For 
5 example / if twenty codons are to be randomized at each 
position within an oligonucleotide population, then ten 
reaction vessels can be used. The use of a smaller number 
of reaction vessels than the number of codons to be 
randomized at each position is preferred because the 
10 smaller nuiaber of reaction vessels is easier to manipulate 
and results in a greater number of possible 
oligonucleotides synthesized . 

The use of a smaller number of reaction vessels 
for random synthesis of twenty codons at a desired position 

15 within an oligonucleotide is similar to that described 
above using twenty reaction vessels except that each 
reaction vessel can contain the synthesis products of more 
than one codon. For example, step one synthesis using ten 
reaction vessels proceeds by coupling about two different 

20 codons on supports contained in each of ten reaction 
vessels. This is shown in Figure 2 where each of the two 
codons coupled to a different support can consist of the 
following sequences: (1) (T/G)TT for Phe and Val; (2) 
{T/C)CT for Ser and Pro; (3) (T/C)AT for Tyr and His; (4) 

25 (T/C)GT for Cys and Arg; (5) (C/A)TG for Leu and Met; (6) 
(C/G)AG for Gin and Glu; (7) (A/G)CT for Thr and Ala; (8) 
(A/G)AT for Asn and Asp; (9) (T/G)GG for Trp and Gly and 
(10) A(T/A)A for lie and Cys. The slash (/) signifies that 
a mixture of the monomers indicated on each side of the 

30 slash are used as if they were a single monomer in the 
indicated coupling step. The antisense sequence for each 
of the above codons can be generated by synthesizing the 
complementary sequence. For example, the antisense for Phe 
and Val can be AA(C/A) . The amino acids encoded by ach of 

35 the above pairs of sequenc s are given as the standard 
three letter nomenclature. 
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Coupling of the monomers in this fashion will 
yield codons specifying all twenty of the naturally 
occurring cunino acids attached to supports in ten reaction 
vessels. However ^ the number of individual reaction 
5 vessels to be used will depend on the number of codons to 
be randomized at the desired position and can be determined 
by one skilled in the art. For example, if ten codons are 
to be randomized, then five reaction vessels can be used 
for coupling. The codon sequences given above can be used 
10 for this synthesis as well. The sequences of the codons 
can also be changed to incorporate or be replaced by any of 
the additional forty-four codons which constitutes the 
genetic code. 

The remaining steps of synthesis of 
15 oligonucleotides with random codons using a smaller number 
of reaction vessels are as outlined above for synthesis 
with twenty reaction vessels except that the mixing and 
dividing steps are performed with supports from about half 
the number of reaction vessels. These remaining steps are 
20 shown in Figure 2 (steps 2 through 4). 

Oligonucleotides having at least one specified 
tuplet at a predetermined position and the remaining 
positions having random tuplets are synthesized using the 
methods described herein. The synthesis steps are similar 

25 to those outlined above using twenty or less reaction 
vessels except that prior to synthesis of the specified 
codon position, the dividing of the supports into separate 
reaction vessels for synthesis of different codons is 
omitted. For example, if the codon at the second position 

30 of the oligonucleotide is to be specified, then following 
synthesis of random codons at the first position and mixing 
of the supports, the mixed supports are not divided into 
new reaction vessels but, instead, are contained in a 
single reaction vessel to synthesize the specified codon. 

35 The specified codon is synthesized sequ ntially from 
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individual monomers as described above. Thus, the number 
of reaction vessels is increased or decreased at each step 
to allow for the synthesis of a specified codon or a 
desired number of random codons. In the most preferred 
5 embodiment of this invention, the specified codons are 
codons capable of forming covalent bonds, e.g., cysteine, 
glutconic acid, lysine, leucine and tyrosine. 

Following codon synthesis, the mixed supports are 
divided into individual reaction vessels for synthesis of 

10 the next codon to be randomized (Figure 1, step 3) or can 
be used without separation for synthesis of a consecutive 
specified codon. The rounds of synthesis can be repeated 
for each codon to be added until the desired number of 
positions with predetermined or randomized codons are 

15 obtained. 

Synthesis of oligonucleotides with the first 
— position codon being specified can also be synthesized 
using the above method. In this case, the' "first position 
codon is synthesized from the appropriate monomers . The 
20 supports are divided into the required number of reaction 
vessels needed for synthesis of random codons at the second 
position and the rounds of synthesis, mixing and dividing 
are performed as described above. 

A method of synthesizing oligonucleotides having 
25 tuplets which are diverse but biased toward a predeteinained 
sequence is also described herein. This method employs two 
reaction vessels, one vessel for the synthesis of a 
predetermined sequence and the second vessel for the 
synthesis of a random sequence. This method is 

30 advantageous to use when a significant number of codon 
positions, for example, are to be of a specified sequence 
since it alleviat s the use of multiple reaction vessels. 
Instead, a mixture of four different monomers such as 
adenine, guanine, cytosine and thymine nucleotides are used 
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tor the first and second monomers in the codon. The codon 
is complet d by coupling a mixture of a pair of monomers of 
either guanine and thymine or cytosine and adenine 
nucleotides at the third monomer position. In the second 
5 vessel, nucleotide monomers are coupled sequentially to 
yield the predetermined codon sequence. Mixing of the two 
supports yields a population of oligonucleotides containing 
both the predetermined codon and the random codons at the 
desired position. Synthesis can proceed by using this 
10 mixture of supports in a single reaction vessel, for 
example, for coupling additional predetermined codons or, 
further dividing the mixture into two reaction vessels for 
synthesis of additional random codons. 

The two reaction vessel method can be used for 
codon synthesis within an oligonucleotide with a 
predetermined tuplet sequence by dividing the support 
mixture into two portions at the desired codon position to 
be randomized. Additionally, this method allows for the 
extent of randomization to be adjusted. For example, 
unequal mixing or dividing of the two supports will change 
the fraction of codons with predetermined sequences 
compared to those with random codons at the desired 
position. Unequal mixing and dividing of supports can be 
useful when there is a need to synthesize random codons at 
a significant number of positions within an oligonucleotide 
of a longer or shorter length. 

The extent of randomization can also be adjusted 
by using unequal mixtures of monomers in the first, second 
and third monomer coupling steps of the random codon 
30 position. The unequal mixtures can be in any or all of the 
coupling steps to yield a population of codons enriched in 
sequences reflective of the monomer proportions. 

Synthesis of randomized oligonucleotides is 
performed using methods well known to one skilled in the 
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art. Linear coupling of monomers can, for example, be 
accomplished using phosphoramidite chemis'try with a 
MilliGen/Biosearch Cyclone Plus automated synthesizer as 
described by the manufacturer (Millipore, Burlington/ MA) . 
5 Other chemistries and automated synthesizers can be 
employed as well and are known to one skilled in the art. 

Synthesis of multiple codons can be performed 
without modification to the synthesizer by separately 
synthesizing the codons in individual sets of reactions. 
10 Alternatively, modification of an automated DNA synthesizer 
can be performed for the simultaneous synthesis of codons 
in multiple reaction vessels. 

In one embodiment, the invention provides a 
plurality of procaryotic cells containing a diverse 

15 population of expressible oligonucleotides operationally 
linked to expression elements, the expressibl 
oligonucleotides having a desirable bias of random codon 
sequences. These oligonucleotides can, in one embodiment, 
be produced from diverse combinations of first and second 

20 precursor oligonucleotides having a desirable bias of 
random sequences. The invention provides for a method for 
constructing such a plurality of procaryotic cells as well. 

The oligonucleotides synthesized by the above 
methods can be used to express a plurality of random 

25 soluble peptides having constrained secondary structure in 
solution, diverse but biased toward a predetermined 
sequence or which contain at least one specified codon at 
a predetermined position. The need will determine which 
type of oligonucleotide is to be expressed to give the 

30 resultant population of random peptides and is known to one 
skilled in the art. Expression can be performed in any 
compatible vector /host system. Such systems include, for 
example, plasmids or phagemids in procaryotes such as E. 
coli, yeast systems, and other eucaryotic systems such as 
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mammalian cells , but. will be described herein in context 
with its presently preferred embodiment, i.e. expression on 
the surface of filamentous bacteriophage. Filconentous 
bacteriophage can be^ for example, M13, fl and fd. Such 
5 phage have circular single-stranded genomes and double 
strand replicative DNA forms. Additionally, the peptides 
can also be expressed in soluble or secreted form depending 
on the need and the vector /host system employed. 
Furthermore, this invention provides host cells containing 
10 the expressible oligonucleotides, the vectors and the 
isolated soluble, stable peptides produced by growing a 
host cell described above under conditions favoring 
expression of the oligonucleotide, and isolating the 
peptide so produced. 

15 For the purpose of illustration only, expression 

of random peptides on the surface of Ml 3 can be 
accomplished, for exeonple, using the vector system shown in 
Figure 3. Construction of the vectors enabling one of 
ordinary skill to make them are explicitly set out in 

20 Ex£unples I and II. The complete nucleotide sequences are 
given in Figures 5, 6 and 7 (SEQ ID NOS: 1, 2 and 3, 
respectively) . This system produces random 

oligonucleotides functionally linked to expression elements 
and to gVIII by combining two smaller oligonucleotide 

25 portions contained in separate vectors into a single 
vector. The diversity of oligonucleotide species obtained 
by this system or others described herein can be 5 x 10^ or 
greater. Diversity of less than 5 x IC can also be 
obtained and will be determined by the need and type of 

30 random peptides to be expressed. The random combination of 
two precursor portions into a larger oligonucleotide 
increases the diversity of the population several fold and 
has the added advantage of producing oligonucleotides 
larger than what can be synthesized by standard methods. 

35 Additionally, although the correlation is not known, when 
the number of possible paths an oligonucleotide can take 
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during synthesis such as described herein is greater than 
the number of beads, then there will be a correlation 
bietween the synthesis path and the sequences obtained. By 
combining oligonucleotide populations which are synthesized 
5 separately, this correlation will be destroyed. Therefore, 
any bias which may be inherent in the synthesis procedures 
will be alleviated by joining two precursor portions into 
a contiguous random oligonucleotide « 

Populations of precursor oligonucleotides to be 
10 combined into an expressible form are each cloned into 
separate vectors. The two precursor portions which make up 
the combined oligonucleotide corresponds to the carboxy and 
amino terminal portions of the expressed peptide. Each 
precursor oligonucleotide can encode either the sense or 
15 anti-sense and will depend on the orientation of the 
expression elements and the gene encoding the fusion 
portion of the protein as well as the mechanism used to 
join the two precursor oligonucleotides. For the vectors 
shown in Figure 3, precursor oligonucleotides corresponding 
to the carboxy terminal portion of the peptide encode the 
sense strand. Those corresponding to the ami no terminal 
portion encode the anti-sense strand. Oligonucleotide 
populations are inserted between the Eco RI and Sac I 
restriction enzyme sites in M13IX22 and M13IX42 (Figure 3A 
and B). M13IX42 (SEQ ID NO: 1) is the vector used for 
sense strand precursor oligonucleotide portions and HI 3 1X2 2 
(SEQ ID NO: 2) is used for anti-sense precursor portions. 

The populations of randomized oligonucleotides 
inserted into the vectors are synthesized with Eco RI and 
Sac I recognition sequences flanking opposite ends of the 
random codon sequences. The sites allow annealing and 
ligation of these single strand oligonucleotides into a 
double stranded vector restrict d with Eco RI and Sac I. 
Alternatively, the oligonucl otides can be inserted into 
the vector by standard mutagenesis methods. In this latter 
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method, single stranded vector DNA is isolated from the 
phage and annealed with random oligonucleotides having 
known sequences complementary to vector sequences. The 
oligonucleotides are extended with DNA polymerase to 
5 produce double stranded vectors containing the randomized 
oligonucleotides . 

A vector useful for sense strand oligonucleotide 
portions. Ml 3 1X4 2 (Figure 3B) contains down- stream and in 
frame with the Eco RI and Sac I restriction sites a 

10 sequence encoding the pseudo-wild type gVIII product. This 
gene encodes the wild type M13 gVIII amino acid sequence 
but has been changed at the nucleotide level to reduce 
homologous recombination with the wild type gVIII contained 
on the same vector. The wild type gVIII is present to 

15 ensure that at least some functional, non-fusion coat 
protein will be produced. The inclusion of a wild type 
gVIII therefore reduces the possibility of non-viable phage 
production and biological selection against certain peptide 
fusion proteins. Differential regulation of the two gen s 

20 can also be used to control the relative ratio of the 
pseudo and wild type proteins. 

Also contained downstream and in frame with the 
Eco RI and Sac I restriction sites is an amber stop codon. 
The mutation is located six codons downstream from Sac I 

25 and therefore lies between the inserted oligonucleotides 
and the gVIII sequence. As was the function of the wild 
type gVIII, the amber stop codon also reduces biological 
selection when combining precursor portions to produce 
expressible oligonucleotides. This is accomplished by 

30 using a non-suppressor (sup O) host strain because non- 
suppressor strains will terminate expression after the 
oligonucleotide sequences but before the pseudo gVIII 
seguenc s. Ther fore, the pseudo gVIII will never be 
expressed on the phage surface under these circumstances. 

35 Instead, only soluble peptides will be produced. 
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Expression in a non- suppressor strain can be advantageously 
utilized when one wishes to produce large populations of 
soluble peptides. Stop codons other than amber, such as 
opal and ochre, or molecular switches, such as inducible 
5 repressor elements, can also be used to unlink peptide 
expression from surface expression. Additional controls 
exist as well and are described below. 

A vector useful for anti-sense strand 
oligonucleotide portions, M13IX22, (Figure 3A) , contains 

10 the expression elements for the peptide fusion proteins. 
Upstream and in frame with the Sac I and Eco RI sites in 
this vector is a leader sequence for surface expression. 
A ribosome binding site and Lac Z promoter /operator 
elements are present for transcription and translation of 

15 the peptide fusion proteins. 

Both vectors contain a pair of Fok I restriction 
^enzyme sites (Figure 3 A and B) for joining together two 
precursor oligonucleotide portions and their vector 
sequences. One site is located at the ends of each 

20 precursor oligonucleotide which is to be joined. The 
second Fok I site within the vectors is located at the end 
of the vector sequences which are to be joined. The 5' 
overhang of this second Fok I site has been altered to 
encode a sequence which is not found in the overhangs 

25 produced at the first Fok I site within the oligonucleotide 
portions. The two sites allow the cleavage of each 
circular vector into two portions and subsequent ligation 
of essential components within each vector into a single 
circular vector where the two oligonucleotide precursor 

30 portions form a contiguous sequence (Figure 3C) . Non- 
compatible overhangs produced at the two Fok I sites allows 
optimal conditions to be selected for performing 
concatemerization or circularization reactions for joining 
the two vector portions. Such selection of conditions can 

35 be used to govern the reaction order and therefore increase 
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the efficiency of joining. 

Fok I is a restriction enzyme whose recognition 
sequence is distal to the point of cleavage. Distal 
placement of the recognition sequence in its location to 
5 the cleavage point is important since if the tv;o were 
superimposed within the oligonucleotide portions to be 
combined, it would lead to an invariant codon sequence at 
the juncture. To alleviate the formation of invariant 
codons at the juncture, Fok I recognition sequences can be 

10 placed outside of the random codon sequence and still be 
used to restrict within the random sequence. Subsequent 
annealing of the single-strand overhangs produced by Fok I 
and ligation of the two oligonucleotide precursor portions 
allows the juncture to be formed. A variety of restriction 

15 enzymes restrict DNA by this mechanism and can be used 
instead of Fok I to join precursor oligonucleotides without 
creating invariant codon sequences. Such enzymes include, 
for example, Alw I, Bbu I, Bsp HI, Hga I, Hph I, Mbo II, 
Mnl I, Pie I and Sfa NI. One skilled in the art knows how 

20 to substitute Fok I recognition sequences for alternative 
enzyme recognition sequences such as those above, and use 
the appropriate enzyme for joining precursor oligo- 
nucleotide portions. 

Although the sequences of the precursor 
25 oligonucleotides are random and will invariably have 
oligonucleotides within the two precursor populations whose 
sequences are sufficiently complementary to anneal after 
cleavage, the efficiency of annealing can be increased by 
insuring that the single-strand overhangs within one 
30 precursor population will have a complementary sequence 
within the second precursor population. This can be 
accomplished by synthesizing a non-degenerate series of 
known s quences at the Fok I cleavage site coding for each 
of the twenty amino acids. Since the Fok I cleavage site 
35 contains a four base overhang, forty different sequences 
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are needed to randomly encode all twenty eunino acids. For 
example, if two precursor populations of ten codons in 
length are to be combined, then after the ninth codon 
position is synthesized, the mixed population of supports 
5 are divided into forty reaction vessels for each of the 
populations and complementary sequences for each of the 
corresponding reaction vessels between populations are 
independently synthesized. The sequences are shown in 
Tables III and VI of Example I where the oligonucleotides 

10 on columns IR through 4 OR form complementary overhangs with 
the oligonucleotides on the corresponding columns IL 
through 40L once cleaved. The degenerate X positions in 
Table VI are necessary to maintain the reading frame once 
the precursor oligonucleotide portions are joined. 

15 However, use of restriction enzymes which produce a blunt 
end, such as Mnl I can be alternatively used in place of 
Fok I to alleviate the degeneracy introduced in maintaining 
the reading frame • 



The last feature exhibited by each of the vectors 
is an amber stop codon located in an essential coding 
sequence within the vector portion lost during combining 
(Figure 3C) . The amber stop codon is present to select for 
viable phage produced from only the proper combination of 
precursor oligonucleotides and their vector sequences into 
a single vector species. Other non-sense mutations or 
selectable markers can work as well. 

The combining step randomly brings together 
different precursor oligonucleotides within the two 
populations into a single vector (Figure 3C; M13IX) . For 
30 excunple, the vector sequences donated from each independent 
vector described above, M13IX22 and M13IX42, are necessary 
for production of viable phage. Also, since the expression 
elements are contained in M13IX22 and the gVIII sequences 
are contained in M13IX42, expression of functional gVIII- 
35 peptide fusion proteins cannot be accomplished until the 
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sequences are linked as shown in M13IX. 

The combining step is performed by restricting 
each population of vectors containing randomized 
oligonucleotides with Fok 1/ mixing and ligating (Figure 
5 3C) . Any vectors generated which contain an amber stop 
codon will not produce viable phage when introduced into a 
non- suppressor strain (Figure 3D), Therefore, only the 
sequences which do not contain an amber stop codon will 
make up the final population of vectors contained in the 
10 library ♦ These vector sequences are the sequences required 
for surface expression of randomized peptides. By 
analogous methodology, more than two vector portions can be 
combined into a single vector which expresses random 
peptides • 

15 Surface expression of the random peptide library 

is performed in an eunber suppressor strain. As described 
above, the amber stop codon between the random codon 
sequence and the gVIII sequence unlinks the two components 
in a non-suppressor strain. Isolating the phage produced 

20 from the non-suppressor strain and infecting a suppressor 
strain will link the random codon sequences to the gVIII 
sequence during expression (Figure 3E) . Culturing the 
suppressor strain after infection allows the expression of 
all peptide species within the library as gVIII-peptide 

25 fusion proteins. Alternatively, the DNA can be isolated 
from the non-suppressor strain and then introduced into a 
suppressor strain to accomplish the same effect. 

The level of expression of gVIII-peptide fusion 
proteins can additionally be controlled at the 
30 transcriptional level. The gVIII-peptide fusion proteins 
are under the inducible control of the Lac Z 
promoter /operator system. Other inducible promoters can 
work as well and are known by one skilled in the art. For 
high levels of surface expression, the suppressor library 
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is cultured in an inducer of the Lac Z promoter such as 
isopropylthio-J3-galactoside (IPTG) . Inducible control is 
beneficial because biological selection against non- 
functional gVIII-peptide fusion proteins can be minimized 
5 by culturing the library under non-expressing conditions. 
Expression can then be induced only at the time of 
screening to ensure that the entire population of 
oligonucleotides within the library are accurately 
represented on the phage surface. Also this can be used to 
10 control the valency of the peptide on the phage surface. 

The surface expression library is screened for 
specific peptides which bind ligand binding proteins by 
standard affinity isolation procedures. Such methods 
include, for example, panning, affinity chromatography and 

15 solid phase blotting procedures. Panning as described by 
Parmley and Smith, Gene 73:305-318 (1988), which is 
incorporated herein by reference, is preferred because high 
titers of phage can be screened easily, quickly and in 
small volumes. Furthermore, this procedure can select ^ 

20 minor peptide species within the population, which 
otherwise would have been undetectable, and amplified to 
substantially homogenous populations. The selected peptide 
sequences can be detenoined by sequencing the nucleic acid 
encoding such peptides after amplification of the phage 

25 population. 

The invention provides a plurality of procaryotic 
cells containing a diverse population of oligonucleotides 
encoding soluble peptides having constrained secondary 
structure in solution , the oligonucleotides being 
30 operationally linked to expression sequences. The 
invention provides for methods of constructing such 
populations of cells as well. 

Random oligonucleotides synthesized by any of the 
methods described previously can also be expressed on the 
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surface of filamentous bacteriophage, such as M13, for 
example, without the joining together of precursor 
oligonucleotides. A vector such as that shown in Figure 4, 
M13IX30, can be used. This vector exhibits all the 
5 functional features of the combined vector shown in Figure 
3C for surface expression of gVIII-peptide fusion proteins. 
The complete nucleotide sequence for M13IX30 (SEQ ID NO: 3) 
is shown in Figure 7 • 

For example, M13IX30 contains a wild type gVIII 
10 for phage viability and a pseudo gVIII sequence for peptide 
fusions. The vector also contains in frame restriction 
sites for cloning random peptides. The cloning sites in 
this vector are Xho I, Stu I and Spe I. Oligonucleotides 
should therefore be synthesized with the appropriate 
15 complementary ends for annealing and ligation or 
insertional mutagenesis. Alternatively, the appropriate 
termini can be generated by PGR technology. Between the 
restriction sites and the pseudo gVIII sequence is an in- 
frame amber stop codon, again, ensuring complete viability 
20 of phage in constructing and manipulating the library. 
Expression and screening is performed as described above 
for the surface expression library of oligonucleotides 
generated from precursor portions. 

Thus, peptides can be selected that are capable 
25 of being bound by a ligand binding protein from a 
population of random peptides by (a) operationally linking 
a diverse population of oligonucleotides having a desirable 
bias of random codon sequences to expression elements; (b) 
introducing said population of vectors into a compatible 
30 host under conditions sufficient for expressing said 
population of random peptides; and (c) determining the 
peptides which bind to said binding protein. Also provided 
is a method for determining the encoding nucleic acid 
sequence of such selected peptides. 
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The following examples are intended to 
illustrate, but not limit the invention. 

EXAMPLE I 

5 Isolation and Characterization of Peptide Liaands Generated 
From Right and Left Half Random Oligonucleotides 

This excunple shows the synthesis of random 
oligonucleotides and the construction and expression of 
surface expression libraries of the encoded randomized 
10 peptides. The random peptides of this example derive from 
the mixing and joining together of two random 
oligonucleotides* Also demonstrated is the isolation and 
characterization of peptide ligands and their corresponding 
nucleotide sequence for specific binding proteins. 

15 Synthesis of Random Oligonucleotides 

The synthesis of two randomized oligonuciebtides 
which correspond to smaller portions of a larger randomized 
oligonucleotide is shov/n below. Each of the two smaller 
portions make up one-half of the larger oligonucleotide. 

20 The population of randomized oligonucleotides constituting 
each half are designated the right and left half. Each 
population of right and left halves are ten codons in 
length with twenty random codons at each position. The 
right half corresponds to the sense sequence of the 

25 randomized oligonucleotides and encode the carboxy terminal 
half of the expressed peptides. The left half corresponds 
to the anti-sense sequence of the randomized 
oligonucleotides and encode the amino terminal half of the 
expressed peptides. The right and left halves of the 

30 randomized oligonucleotide populations are cloned into 
separate vector species and then mix d and joined so that 
the right and left halves come together in random 
combination to produce a single expression vector species 
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which contains a population of randomized oligonucleotides 
twenty codons in length. Electroporation of the vector 
population into an appropriate host produces filamentous 
phage which express the random peptides on their surface. 

5 The reaction vessels for oligonucleotide 

synthesis were obtained from the manufacturer of the 
automated synthesizer (Millipore, Burlington/ MA; supplier 
of MilliGen/Biosearch Cyclone Plus Synthesizer) . The 
vessels were supplied as packages containing empty reaction 

10 coliimns (1 ^ole), frits, crimps and plugs 
(MilliGen/Biosearch catalog # GEN 860458). Derivatized and 
underivatized control pore glass, phosphoramidite 
nucleotides, and synthesis reagents were also obtained from 
MilliGen/Biosearch. Crimper and decrimper tools were 

15 obtained from Fisher Scientific Co., Pittsburgh, PA 
(Catalog numbers 06-406-20 and 06-406-25A, respectively). 

Ten reaction columns were used for right half 
synthesis of random oligonucleotides ten codons in length. 
The oligonucleotides have 5 monomers at their 3 ' end of the 

20 sequence 5'GAGCT3' and 8 monomers at their 5' end of the 
sequence 5 'AATTCCAT3 ' . The synthesizer was fitted with a 
column derivatized with a thymine nucleotide (T-column, 
MilliGen/Biosearch # 0615.50) and was programmed to 
synthesize the sequences shown in Table I for each of ten 

25 columns in independent reaction sets. The sequence of the 
last three monomers (from right to left since synthesis 
proceeds 3' to 5') encode the indicated amino acids: 

Table I 

Sequence 

30 Column ( 5 ^ to 3 ^ ) Amino Acids 

column IR (T/G)TTGAGCT Phe and Val 

column 2R (T/C)CTGAGCT Ser and Pro 

column 3R (T/C)ATGAGCT Tyr and His 
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coliimn 4R 
colimn 5R 
column 6R 
column 7R 
column 8R 
column 9R 
column IR 



(T/C)GTGAGCT 
(C/A)TGGAGCT 
( C/G )AGGAGCT 
(A/G)CTGAGCT 
(A/G)ATGAGCT 
(T/G)GGGAGCT 
A(T/A)AGAGCT 



Cys 
Leu 
Gin 
Thr 
Asn 
Trp 
He 



and 
and 
and 
and 
and 
and 
and 



Arg 
Met 
Glu 
Ala 
Asp 
Gly 
Cys 



where the two monomers in parentheses denote a single 
monomer position within the codon and indicate that an 

10 equal mixture of each monomer was added to the reaction for 
coupling. The monomer coupling reactions for each of the 
10 columns were performed as recommended by the 
manufacturer ( amidite version SI. 06, # 84 00-050990, scale 
1 piM) . After the last coupling reaction, the columns were 

15 washed with acetonitrile and lyophilized to dryness. 



Following synthesis, the plugs were removed from 

each column using a decrimper and the reaction products 

were poured into a single weigh boat. Initially the bead 
mass increases, due to the weight of the monomers, however, 

20 at later rounds of synthesis material is lost. In either 
case, the material was equalized with underivatized control 
pore glass and mixed thoroughly to obtain a random 
distribution of all twenty codon species. The reaction 
products were then aliquotted into 10 new reaction columns 

25 by removing 25 mg of material at a time and placing it into 
separate reaction columns. Alternatively, the reaction 
products can be aliquotted by suspending the beads in a 
liquid that is dense enough for the beads to remain 
dispersed, preferably a liquid that is equal in density to 

30 the beads, and then aliquoting equal volumes of the 
suspension into separate reaction columns. The lip on the 
inside of the columns where the frits rest was cleared of 
material using vacuum suction with a syringe and 25 G 
needle. New frits were placed onto the lips, the plugs 

35 were fitted into the columns and were crimped into place 
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using a crimper. 

Synthesis of the second codon position was 
achieved using the above 10 columns containing the random 
mixture of reaction products from the first codon 
5 synthesis. The monomer coupling reactions for the second 
codon position are shown in Table II. An A in the first 
position means that any monomer can be programmed into the 
synthesizer. At that position, the first monomer position 
is not coupled by the synthesizer since the software 

10 assumes that the monomer is already attached to the column. 
An A also denotes that the columns from the previous codon 
synthesis should be placed on the synthesizer for use in 
the present synthesis round. Reactions were again 
sequentially repeated for each column as shown in Table II 

15 and the reaction products washed and dried as described 
above. 
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Table II 



Coliunn 




Sequence 
(5' to 3M 


Amino Acids 


column 


IR 


(T/G)TTA 


Phe 


and Val 


column 


2R 


( T/C ) CTA 


Ser 


and Pro 


column 


3R 


(T/C)ATA 


Tyr 


and His 


column 


4R 


(T/C)GTA 


Cys 


and Arg 


column 


5R 


(C/A)TGA 


Leu 


and Met 


column 


5R 


(C/G)AGA 


Gin 


and Glu 


column 


7R 


(A/G)CTA 


Thr 


and Ala 


column 


8R 


(A/G)ATA 


Asn 


and Asp 


column 


9R 


( T/G ) GGA 


Trp 


and Gly 


column 


lOR 


A(T/A)AA 


He 


and Cys 



Randomization of the second codon position was 
15 achieved by removing the reaction products from each of the 
columns and thoroughly mixing the material. The material 
was again divided into new reaction columns and prepared 
for monomer coupling reactions as described above. 

Random synthesis of the next seven codons 
20 (positions 3 through 9) proceeded identically to the cycle 
described above for the second codon position and again 
used the monomer sequences of Table II. Each of the newly 
repacked columns containing the random mixture of reaction 
products from synthesis of the previous codon position was 
25 used for the synthesis of the subsequent codon position. 
After synthesis of the codon at position nine and mixing of 
the reaction products, the material was divided and 
repacked into 40 different columns and the monomer 
sequences shown in Table III were coupled to each of the 40 
30 columns in independent reactions. The oligonucleotides 
from each of th 4 0 columns were mixed once more and 
cl aved from the control pore glass as recommended by the 
manufacturer . 
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Table III 



10 



15 



20 



25 



30 



35 



Column 




Seauence f 5 ' to 3 ' ^ 


coliunn 


IR 


AATTCTTTTA 


column 


2R 


AATTCTGTTA 


column 


3R 


AATTCGTTTA 


column 


4R 


AATTCGGTTA 


column 


5R 


AATTCTTCTA 


column 


6R 


AATTCTCCTA 


column 


7R 


AATTCGTCTA 


column 


8R 


AATTCGCCTA 


column 


9R 


AATTCTTATA 


column 


lOR 


AATTCTCATA 


column 


IIR 


AATTCGTATA 


column 


12R 


AATTCGCATA 


column 


13R 


AATTCTTGTA 


column 


14R 


AATTCTCGTA 


column 


15R 


AATTCGTGTA 


column 


16R 


AATTCGCGTA 


column 


17R 


AATTCTCTGA 


column 


18R 


AATTCTATC^ 


column 


19R 


AATTCGCTGA 


column 


20R 


AATTCGATGA 


colvimn 


21R 


AATTCTCAGA 


coliunn 


22R 


AATTCTGAGA 


column 


23R 


AATTCGCAGA 


column 


24R 


AATTCGGAGA 


column 


25R 


AATTCTACTA 


column 


26R 


AATTCTGCTA 


column 


27R 


AATTCGACTA 


column 


28R 


AATTCGGCTA 


column 


29R 


AATTCTAATA 


column 


3 OR 


AATTCTGATA 


column 


31R 


AATTCGAATA 


column 


32R 


AATTCGCATA 


column 


33R 


AATTCTTGGA 


column 


34R 


AATTCTGGGA 
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column 35R 
coliunn 36R 
coluion 37R 
column 38R 
column 39R 
column 4 OR 



AATTCGTGGA 
AATTCGGGGA 
AATTCTATAA 
AATTCTAAAA 
AATTCGATi^ 
AATTCGAAAA 



Left half synthesis of random oligonucleotides 
proceeded similarly to the right half synthesis. This half 
of the oligonucleotide corresponds to the anti-sense 

10 sequence of the encoded randomized peptides. Thus, the 
complementary sequence of the codons in Tables I through 
III are synthesized. The left half oligonucleotides also 
have 5 monomers at their 3' end of the sequence 5'GAGCT3' 
and 8 monomers at their 5 ' end of the sequence 

15 5 ' AATTCCAT3 ' . The rounds of synthesis, washing, drying, 
mixing, and dividing are as described above. 

_ For the first codon position, the synthesizer was 

fitted with a T--column and programmed to synthesize the 
sequences shown in Table IV for each of ten columns in 
20 independent reaction sets. As with right half synthesis, 
the sequence of the last three monomers (from right to 
left) encode the indicated amino acids: 
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Table IV 



Sequence 



Coluinii 




f 5 ' to 3 ' ) 


Amino Acids 


column 


IL 


AA{A/C)GAGCT 


Fne 


ana 


vax 


column 


2L 


AG(A/G)GAGCT 


Ser 


and 


Pro 


coliimn 


3L 


AT(A/G)GAGCT 


Tyr 


and 


His 


column 


4L 


AC(A/G)GAGCT 


Cys 


and 


Arg 


column 


5L 


CA(G/T)GAGCT 


Leu 


and 


Met 


column 


6L 


CT(G/C)GAGCT 


Gin 


and 


Glu 


column 


7L 


AG(T/C)GAGCT 


Thr 


and 


Ala 


column 


8L 


AT(T/C)GAGCT 


Asn 


and 


Asp 


column 


9L 


CC(A/C)GAGCT 


Trp 


and 


Gly 


column 


lOL 


T(A/T)TGAGCT 


He 


and 


Cys 



Following washing and drying, the plugs for each 
15 column were removed/ mixed and aliquotted into ten new 
reaction columns as described above* Synthesis of the 
second codon position was achieved using these ten columns 
containing the random mixture of reaction products from the 
first codon synthesis. The monomer coupling reactions for 
20 the second codon position are shown in Table V. 

Table V 



Sequence 



Column 




(5' to 3M 


Amino Acids 


column 


IL 


AA(A/C)A 


Phe 


and 


Val 


column 


2L 


AG(A/G)A 


Ser 


and 


Pro 


column 


3L 


AT(A/G)A 


Tyr 


and 


His 


column 


4L 


AC(A/G)A 


Cys 


and 


Arg 


column 


5L 


CA(G/T)A 


Leu 


and 


Met 


column 


6L 


CT(G/C)A 


Gin 


and 


Glu 


column 


7L 


AG{T/C)A 


Thr 


and 


Ala 


column 


8L 


AT{T/C)A 


Asn 


and 


Asp 


coliimn 


9L 


CC(A/C)A 


Trp 


and 


Gly 


coliixon 


lOL 


T(A/T)TA 


He 


and 


Cys 
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Again, randomization of the second codon position 
was achieved by removing the reaction products from each of 
the columns and thoroughly mixing the beads. The beads 
were repacked into ten new reaction columns, 

5 Random synthesis of the next seven codon 

positions proceeded identically to the cycle described 
above for the second codon position and again used the 
monomer sequences of Table V. After synthesis of the codon 
at position nine and mixing of the reaction products, the 
10 material was divided and repacked into 40 different columns 
and the monomer sequences shown in Table VI were coupled to 
each of the 40 columns in independent reactions. 

Table VI 



Column 




Seauence f 5 ' to 3' ) 


column 


IL 


AATTCCATAAAAXXA 


column 


2L 


AATTCCATAAACXXA 


coliunn 


3L 


AATTCCATAACAXXA 


column 


4L 


AATTCCATAACCXXA 


column 


5L 


AATTCCATAGAAXXA 


column 


6L 


AATTCCATAGACXXA 


column 


7L 


AATTCCATA6GAXXA 


coliimn 


8L 


AATTCCATAGGCXXA 


column 


9L 


AATTCCATATAAXXA 


coliimn 


lOL 


AATTCCATATACXXA 


column 


IIL 


AATTCCATATGAXXA 


column 


12L 


AATTCCATATGCXXA 


column 


13L 


AATTCCATAGAAXXA 


column 


14L 


AATTCCATACACXXA 


column 


15L 


AATTCCATACGAXXA 


column 


16L 


AATTCCATACGCXXA 


colximn 


17L 


AATTCCATCAGAXXA 


column 


18L 


AATTCCATCAGCXXA 


column 


19L 


AATTCCATCATAXXA 


column 


20L 


AATTCCATCATCXXA 
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column 


21L 


AATTCCATCTGAXXA 


column 


22L 


AATTCCATCTGCXXA 


column 


23L 


AATTCCATCTGAXXA 


column 


24L 


AATTCCATCTCCXXA 


column 


25L 


AATTCCATAGTAXXA 


column 


26L 


AATTCCATAGTCXXA 


column 


27L 


AATTCCATAGCAXXA 


column 


28L 


AATTCCATAGCCXXA 


column 


29L 


AATTCCATATTAXXA 


column 


30L 


AATTCCATATTCXXA 


column 


31L 


AATTCCATATCAXXA 


column 


32L 


AATTCCATATCCXXA 


column 


33L 


AATTCCATCCAAXXA 


column 


34L 


AATTCCATCCACXXA 


column 


35L 


AATTCCATCCCAXXA 


column 


36L 


AATTCCATCCCCXXA 


column 


37L 


AATTCCATTATAXXA 


column 


38L 


AATTCCATTATCXXA 


column 


39L 


AATTCCATTTTAXXA 


column 


40L 


AATTCCATTTTCXXA 



The first two monomers denoted by an "X" represent an equal 
mixture of all four nucleotides at that position. This is 
necessajry to retain a relatively unbiased codon sequence at 
the junction between right and left half oligonucleotides. 
25 The above right and left half random oligonucleotides were 
cleaved and purified from the supports and used in 
constructing the surface expression libraries below. 

Vector Construction 

Two M13-based vectors, M13IX42 (SEQ ID NO: 1) and 
30 M13IX22 (SEQ ID NO: 2), were constructed for the cloning 
and propagation of right and left half populations of 
random oligonucleotides, respectively. The vectors were 
specially constructed to facilitate the random joining and 
subsequ nt expression of right and left half 
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oligonucleotide populations . Each vector within the 
population contains one right and one left half 
oligonucleotide from the population joined together to form 
a single contiguous oligonucleotide with random codons 
5 which is twenty-two codons in length. The resultant 
population of vectors are used to construct a surface 
expression library • 

M13IX42, or the right-half vector / was 
constructed to harbor the right half populations of 
randomized oligonucleotides. MlBmplS (Pharmacia, 

Piscataway, NJ) was the starting vector. This vector was 
genetically modified to contain, in addition to the encoded 
wild type M13 gene VIII already present in the vector: (1) 
a pseudo-wild type M13 gene VIII sequence with a stop codon 
(amber) placed between it and an Eco Rl-Sac I cloning site 
for randomized oligonucleotides; (2) a pair of Fok I sites 
to be used for joining with M13IX22, the left-half vector; 
(-3 ) a second amber stop codon placed on the opposite side 
of the vector than the portion being combined with the 
left-half vector; and (4) various other mutations to remove 
redundant restriction sites and the amino terminal portion 
of Lac Z • 

The pseudo-wild type M13 gene VIII was used for 
surface expression of random peptides. The pseudo-wild 

25 type gene encodes the identical amino acid sequence as that 
of the wild type gene; however, the nucleotide sequence has 
been altered so that only 63% identity exists between this 
gene and the encoded wild type gene VIII. Modification of 
the gene VIII nucleotide sequence used for surface 

30 expression reduces the possibility of homologous 
recombination with the wild type gene VIII contained on the 
same vector. Additionally, the wild type M13 gene VIII was 
r tained in the vector system to ensure that at least some 
functional, non-fusion coat protein would be produced. The 

35 inclusion of wild type gene VIII therefore reduces the 



10 



15 



20 
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possibility of non-viable phage production from the random 
peptide fusion genes . 

The pseudo-wild type gene VIII was constructed by 
chemically synthesizing a series of oligonucleotides which 
5 encode both strands of the gene. The oligonucleotides are 
presented in Table VII (SEQ ID NOS: 7 through 16). 

TABLE VII 

Pseudo-Wild Type Gene VIII Oligonucleotide Series 



10 



Top Strand 
Oligonucleotides 



Sequence f 5 ' to 3 ' \ 



15 



VIII 03 
VIII 04 
VIII 05 
VIII 06 
VIII 07 



GATCC TAG GCT GAA GGC GAT 

GAG CCT GCT AAG GCT GC 

A TTC AAT AGT TTA CAG GCA 

AGT GCT ACT GAG TAG A 

TT GGC TAG GCT TGG GCT ATG 

GTA GTA GTT ATA GTT 

GGT GCT ACC ATA GGG ATT AAA 

TTA TTC AAA AAG TT 

T ACG AGC AAG GCT TCT TA 



20 



Bottom Strand 
Oligonucleotides 



25 



30 



VIII 09 
VIII 09 
VIII 10 
VIII 11 
VIII 12 



AGC TTA AGA AGC CTT GCT CGT 

AAA CTT TTT GAA TAA TTT 

AAT CCC TAT GGT AGC ACC AAC 

TAT AAC TAC TAC CAT 

AGC CCA AGC GTA GCC AAT GTA 

CTC AGT AGC ACT TG 

C CTG TAA ACT ATT GAA TGC 

AGC CTT AGC AGG GTC 

ATC GCC TTC AGC CTA G 
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Except for the terminal oligonucleotides VIII 03 
(SEQ ID NO: 7) and VIII 08 (SEQ ID NO: 12), the above 
oligonucleotides (oligonucleotides VIII 04 -VIII 07 and 09- 
12 (SEQ ID NOS: 8 through 11 and 13 through 16)) were mixed 
5 at 200 ng each in 10 idl final volume and phosphorylated 
with T4 polynucleotide Kinase (Pharmacia, Piscataway/ NJ) 
with 1 mM ATP at 37 *C for 1 hour. The reaction was stopped 
at 65 *C for 5 minutes. Terminal oligonucleotides were 
added to the mixture and annealed into double-stranded form 

10 by heating to 65 '^C for 5 minutes, followed by cooling to 
room temperature over a period of 30 minutes. The annealed 
oligonucleotides were ligated together with 1.0 U of T4 DNA 
ligase (BRL) . The annealed and ligated oligonucleotides 
yield a double-stranded DNA flanked by a Bam HI site at its 

15 5' end and by a Hind III site at its 3' end. A 
translational stop codon (amber) immediately follows the 
Bam HI site. The gene VIII sequence begins with the codon 
GAA (Glu) two codons 3' to the stop codon. The double- 
- stranded insert was phosphorylated using T4 DNA Kinase 

20 (Pharmacia, Piscataway, NJ) and ATP (10 mM Tris-HCl, pH 
7.5^ 10 mM MgCla) and cloned in frame with the Eco RI and 
Sac I sites within the M13 polylinker. To do so, M13mpl8 
was digested with Bcun HI (New England Biolabs, Beverley, 
MA) and Hind III (New England Biolabs) and combined at a 

25 molar ratio of 1:10 with the double-stranded insert. The 
ligations were performed at 16*^C overnight in IX ligase 
buffer (50 mM Tris-HCl^ pH 7.8, 10 mM MgClj, 20 mM DTT, 1 mM 
ATP, 50 yg/ml BSA) containing 1.0 U of T4 DNA ligase (New 
England Biolabs). The ligation mixture was transformed 

30 into a host and screened for positive clones using standard 
procedures in the art. 

Several mutations were generated within the 
right-half vector to yield functional M13IX42. The 
mutations were gen rated using th method of Kunkel et al . , 
35 Meth. Enzymol. 154:367-382 (1987), which is incorporated 
herein by reference, for site-directed mutagenesis. The 
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reagents, strains and protocols were obtained from a Bio 
Rad Mutagenesis kit (Bio Rad, Richmond, CA) and mutagenesis 
was performed as recommended by the manufacturer. 

A Fok I site used for joining the right and left 
5 halves was generated 8 nucleotides 5 ' to the unique Eco RI 
site using the oligonucleotide 5 '-CTCGAATTCGTACATCCT 
GGTCATAGC-3' (SEQ ID KO: 17). The second Fok I site 
retained in the vector is naturally encoded at position 
3547; however, the sequence within the overhang was changed 

10 to encode CTTC. Two Fok I sites were removed from the 
vector at positions 239 and 7244 of Ml3mpl8 as well as the 
Hind III site at the end of the pseudo gene VIII sequence 
using the mutant oligonucleotides 5 ' -CATTTTTGCAGATGGCTTAGA 
-3' (SEQ ID NO: 18) and 5 ' -TAGCATTAACGTCCAATA-3 ' (SEQ ID 

15 NO: 19)/ respectively. New Hind III and Mlu I sites were 
also introduced at position 3919 and 3951 of M13IX42. The 
oligonucleotides used for this mutagenesis had the 
sequences 5 '-ATATATTTTAGTAAGCTTCATCTTCT-3/ (SEQ ID NO: 20) 
and 5 ' -GACAAAGAACGCGTGAAAACTTT-3 ' ( SEQ ID NO : 21), 

20 respectively. The amino terminal portion of Lac 2 was 
deleted by oligonucleotide-directed mutagenesis using the 
mutant oligonucleotide 5'- 
GCGGGCCTCTTCGCTATTGCTTAAGAAGCCTTGCT-3 ' ( SEQ ID NO : 22 ) . 
This deletion also removed a third Ml3mpl8 derived Fok I 

25 site. The distance between the Eco RI and Sac I sites was 
increased to ensure complete double digestion by inserting 
a spacer sequence. The spacer sequence was inserted using 
the oligonucle otide 5'- 

TTCAGCCTAGGATCCGCCGAGCTCTCCTACCTGCGAATTCGTACATCC-3 ' (SEQ ID 

30 NO: 23). Finally, an amber stop codon was placed at 
position 4492 using the mutant oligonucleotide 5'- 
TGGATTATACTTCTAAATAATGGA-3 ' (SEQ ID NO: 24). The amber 
stop codon is used as a biological selection to ensure the 
proper recombination of vector sequences to bring together 

35 right and left halves of the randomized oligonucleotides. 
In constructing the above mutations, all changes made in a 
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Ml 3 coding region were performed such that the amino acid 
sequence remained unalt r d. It should be noted that 
several mutations within M13mpl8 were found which differed 
from the published sequence. Where known, these sequence 
5 differences are recorded herein as found and therefore may 
not correspond exactly to the published sequence of 
M13mpl8. 

The sequence of the resultant vector. Ml 3 1X4 2, is 
shown in Figure 5 (SEQ ID NO: 1) . Figure 3A also shows 

10 M13IX42 where each of the elements necessary for producing 
a surface expression library between right and left half 
randomized oligonucleotides is marked. The sequence 
between the two Fok I sites shown by the arrow is the 
portion of Ml 3 1X4 2 which is to be combined with a portion 

15 of the left-half vector to produce random oligonucleotides 
as fusion proteins of gene VIII. 

— - M13IX22, or the left-half vector, was constructed 

to harbor the left half populations of randomized — 
oligonucleotides. This vector was constructed from Ml3mpl9 

20 (Pharmacia, Piscataway, NJ) and contains: (1) Two Fok I 
sites for mixing with M13IX42 to bring together the left 
and right halves of the randomized oligonucleotides; (2) 
sequences necessary for expression such as a promoter and 
signal sequence and translation initiation signals; (3) an 

25 Eco Rl-Sac I cloning site for the randomized 
oligonucleotides; and (4) an amber stop codon for 
biological selection in bringing together right and left 
half oligonucleotides • 

Of the two Fok I sites used for mixing M13IX22 
30 with M13IX42, one is naturally encoded in M13mpl8 and 
M13mpl9 (at position 3547). As with M13IX42, the overhang 
within this naturally occurring Fok I site was changed to 
CTTC. The other Fok I site was introduced after 
construction of the translation initiation signals by site- 
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directed mutagenesis using the oligonucleotide 5"^- 
TAACACTCATTCCGGATGGAATTCTGGAGTCTGGGT-3 ' (SEQ ID NO: 25). 

The translation initiation signals were 
constructed by annealing of overlapping oligonucleotides as 
5 described above to produce a double- stranded insert 
containing a 5' Eco RI site and a 3' Hind III site. The 
overlapping oligonucleotides are shown in Tcible VIII (SEQ 
ID NOS: 26 through 34) and were ligated as a double- 
stranded insert between the Eco RI and Hind III sites of 
10 M13inpl8 as described for the pseudo gene VIII insert. The 
ribosome binding site (AGGAGAC) is located in 
oligonucleotide 015 (SEQ ID NO: 26) and the translation 
initiation codon (ATG) is the first three nucleotides of 
oligonucleotide 016 (SEQ ID NO: 27). 

15 TABLE VIII 

Oligonucleotide Series for Construction of 
Translation Signals in M13IX22 

Oligonucleotide Sequence f 5 ^ to 3 ^ ) 

015 AATT C GCC AAG GAG ACA GTC AT 

20 016 AATG AAA TAG CTA TTG CCT ACG GCA 

GCC GCT GGA TTG TT 

017 ATTA CTC GCT GCC CAA CCA GCC ATG 

GCC GAG CTC GTG AT 

018 GACC CAG ACT CCA GATATC CAA CAG 
25 GAA TGA GTG TTA AT 

019 TCT AGA ACG CGT C 

020 ACGT G ACG CGT TCT AGA AT TAA 

CACTCA TTC CTG T 

021 TG GAT ATC TGG AGT CTG GGT CAT 
30 CAC GAG CTC GGC CAT G 

022 GC TGG TTG GGC AGC GAG TAA TAA 

CAA TCC AGC GGC TGC C 
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023 



GT AGG CAA TAG GTA TTT CAT TAT 
GAC TGT CCT TGG CG 



Oligonucleotide 017 (SEQ ID HO: 27) contained a 
Sac I restriction site 67 nucleotides downstream from the 
5 ATG codon. The naturally occurring Eco RI site was removed 
and a new site introduced 25 nucleotides downstream from 
the Sac I. Oligonucleotides 5'- 

TGACTGTCTCCTTGGCGTGTGAAATTGTTA-3' (SEQ ID NO: 35) and 5'- 
TAACACTCATTCCGGATGGAATTCTGGAGTCT 
10 GGGT-3' (SEQ ID NO: 36) were used to generate each of the 
mutations, respectively. An amber stop codon was also 
introduced at position 3263 of M13mpl8 using the 
oligonucleotide 5 '-CAATTTTATCCTAAATCTTA.CCAAC-3 ' (SEQ ID NO: 
37) . 

15 In addition to the adsove mutations, a variety of 

other modifications were made to remove certain sequences 
and redundant restriction sites. The LAC Z ribosome 
binding site was removed when the original Eco RI site in 
M13mpl8 was mutated. Also, the Fok I sites at positions 

20 239, 6361 and 7244 of M13mpl8 were likewise removed with 
mutant oligonucleotides 5 '-CATTTTTGCAGATGGCTTAGA-3 ' (SEQ ID 
NO : 38), 5 ' -CGAAAGGG6GGTGTGCTGCAA- 3 ' ( SEQ ID NO : 39) and 
5 ' -TAGCATTAACGTCCAATA-3 ' ( SEQ ID NO : 40), respectively . 
Again, mutations within the coding region did not alter the 

25 amino acid sequence. 

The resultant vector, M13IX22, is 7320 base pairs 
in length, the sequence of which is shown in Figure 6 (SEQ 
ID NO: 2). The Sac I and Eco RI cloning sites are at 
positions 6290 and 6314, respectively. Figure 3A also 
30 shows M13IX22 where each of the elements necessary for 
producing a surface expression library between right and 
left half randomized oligonucleotides is marked. 

Librarv Construction 
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Each population of right and left half randomized 
oligonucleotides from columns IR through 4 OR and columns IL 
through 40L are cloned separately into M13IX42 and M13IX22, 
respectively, to create sublibraries of right and left half 
5 randomized oligonucleotides. Therefore, a total of eighty 
sublibraries are generated. Separately maintaining each 
population of randomized oligonucleotides until the final 
screening step is performed to ensure maximiim efficiency of 
annealing of right and left half oligonucleotides. The 

10 greater efficiency increases the total number of randomized 
oligonucleotides which can be obtained. Alternatively, one 
can combine all forty populations of right half 
oligonucleotides (columns 1R-40R) into one population and 
of left half oligonucleotides (columns 1L-40L) into a 

15 second population to generate just one sublibrary for each. 

For the generation of sublibraries, each of the 
above populations of randomized oligonucleotides are cloned 
separately into the appropriate vector. The right half 
oligonucleotides are cloned into M13IX42 to generate 

20 sublibraries M13IX42.1R through M13IX42.40R- The left half 
oligonucleotides are similarly cloned into M13IX22 to 
generate sublibraries M13IX22.1L through M13IX22.40L. Each 
vector contains unique Eco RI and Sac I restriction enzyme 
sites which produce 5' and 3' single- stranded overhangs, 

25 respectively, when digested • The single strand overhangs 
are used for the annealing and ligation of the 
complementary single- stranded random oligonucleotides. 

The randomized oligonucleotide populations are 
cloned between the Eco RI and Sac I sites by sequential 

30 digestion and ligation steps. Each vector is treated with 
an excess of Eco RI (New England Biolabs ) at 37^C for 2 
hours followed by addition of 4-24 units of calf intestinal 
alkaline phosphatase (Boehringer Mannheim, Indianapolis, 
IN). Reactions are stopped by ph nol/chlorof orm extraction 

35 and ethanol precipitation. The pellets are resuspended in 
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an appropriate amount of distilled or deionized water 
(dHjO). About 10 pmol of vector is mixed with a 5000-fold 
molar excess of each population of randomized 
oligonucleotides in 10 }ul of IX ligase buffer (50 mM Tris- 
HCl^ pH 7*8, 10 mM MgClj, 20 mM DTT, 1 mM ATP, 50 pg/ml BSA) 
containing 1.0 U of T4 DNA ligase (BRL, Gaithersburg, MD) . 
The ligation is incubated at 16**C for 16 hours. Reactions 
are stopped by heating at 75 for 15 minutes and the DNA 
is digested with an excess of Sac I (New England Biolabs) 
for 2 hours. Sac I is inactivated by heating at 75**C for 
15 minutes and the volume of the reaction mixture is 
adjusted to 300 fjl with an appropriate amount of lOX ligase 
buffer and dHjO. One unit of T4 DNA ligase (BRL) is added 
and the mixture is incubated overnight at 16*^C. The DNA is 
ethanol precipitated and resuspended in TE (10 mM Tris-HCl, 
pH 8.0/ 1 mM EDTA) . DNA from each ligation is 

electroporated into XLl Blue'^** cells (Stratagene, La Jolla, 
CA) , as described below, to generate the sublibraries • 

E. coli XLl Blue™ is electroporated as described 
20 by Smith et al . , Focus 12 : 38-40 ( 1990 ) which is 
incorporated herein by reference. The cells are prepared 
by inoculating a fresh colony of XLls into 5 mis of SOB 
without magnesium (20 g bacto-tryptone , 5 g bacto-yeast 
extract/ 0.584 g NaCl, 0.186 g KCl, dHjO to 1,000 mis) and 
25 grown with vigorous aeration overnight at 37 ^C. SOB 
without magnesium (500 ml) is inoculated at 1:1000 with the 
overnight culture and grown with vigorous aeration at 37**C 
until the OD550 is 0.8 (about 2 to 3 h) . The cells are 
harvested by centrif ugation at 5,000 rpm (2,600 x g) in a 
30 GS3 rotor (Sorvall, Newtown, CT) at 4**C for 10 minutes, 
resuspended in 500 ml of ice-cold 10% (v/v) sterile 
glycerol and centrifuged and resuspended a second time in 
the same manner. After a third centrif ugation, the cells 
are resuspended in 10% sterile glycerol at a final volume 
35 of about 2 ml, such that the ODggo of the susp nsion is 200 
to 300. Usually, resuspension is achieved in the 10% 



10 



15 
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glycerol that remains in the bottle after pouring off the 
supernate. Cells are frozen in 40 fjl aliquots in 
microcentrifuge tubes using a dry ice-ethanol bath and 
stored frozen at -70**C. 

5 Frozen cells are electroporated by thawing slowly 

on ice before use and mixing with about 10 pg to 500 ng of 
vector per 4 0 pi of cell suspension. A 40 pi aliquot is 
placed in an 0.1 cm electroporation chamber (Bio-Rad, 
Richmond, CA) and pulsed once at 0**C using 200 Q parallel 

10 resistor, 25 pF, 1.88 kV, which gives a pulse length (r) of 
-4 ms. A 10 pi aliquot of the pulsed cells are diluted 
into 1 ml SOC (98 mis SOB plus 1 ml of 2 M MgClj and 1 ml of 
2 M glucose) in a 12- x 75-mm culture tube, and the culture 
is shaken at 37*^C for 1 hour prior to culturing in 

15 selective media, (see below). 

Each of the eighty sublibraries are cultur d 
using methods known to one skilled in the art . Such 
methods can be found in Sambrook et al.. Molecular Cloning: 
A Laboratory Manual, Cold Spring Harbor Ijaboratory, Cold 

20 Spring Harbor, 1989, and in Ausubel et al.. Current 
Protocols in Molecular Biology, John Wiley and Sons, New 
York, 1989, both of which are incorporated herein by 
reference. Briefly, the cibove 1 ml sublibrary cultures 
were grown up by diluting 50-fold into 2XYT media (16 g 

25 tryptone, 10 g yeast extract, 5 g NaCl) and culturing at 
37 *C for 5-8 hours . The bacteria were pelleted by 
centrifugation at 10,000 xg. The supernatant containing 
phage was transferred to a sterile tube and stored at 4**C. 

Double strand vector DNA containing right and 
30 left half randomized oligonucleotide inserts is isolated 
from the cell pellet of each sublibrary. Briefly, the 
pellet is washed in TE (10 mM Tris, pH 8.0, 1 mM EDTA) and 
recollected by centrifugation at 7,000 rpm for 5' in a 
Sorval centrifuge (Newtown, CT) . Pellets are resuspended 
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in 6 mis of 10% Sucrose, 50 mM Tris, pH 8.0. 3.0 ml of 10 
mg/pl lysozyme is added and incubated on ice for 20 
minutes. 12 mis of 0.2 M NaOH, 1% SDS is added followed by 
10 minutes on ice. The suspensions are then incubated on 
5 ice for 20 minutes after addition of 7.5 mis of 3 M NaOAc, 
pH 4.6. The samples are centrifuged at 15,000 rpm for 15 
minutes at 4*C, RNased and extracted with 
phenol /chloroform, followed by ethanol precipitation. The 
pellets are resuspended, weighed and an equal weight of 

10 CsClj is dissolved into each tube until a density of 1.60 
g/ml is achieved. EtBr is added to 600 ^rg/ml and the 
double-stranded DNA is isolated by equilibrium 
centrif ugation in a TV-1665 rotor (Sorval) at 50,000 rpm 
for 6 hours. These DNAs from each right and left half 

15 sublibrary are used to generate forty libraries in which 
the right and left halves of the randomized 
oligonucleotides have been randomly joined together. 

Each of the forty libraries are produced by 
joining together one right half and one left half 

20 sublibrary. The two sublibraries joined together 

corresponded to the same column number for right and left 
half random oligonucleotide synthesis. For example, 
sublibrary M13IX42.1R is joined with M13IX22.1L to produce 
the surface expression library M13IX.1RL. In the 

25 alternative situation where only two sublibraries are 
generated from the combined populations of all right half 
synthesis and all left half synthesis, only one surface 
expression library would be produced. 

For the random joining of each right and 1 ft 
30 half oligonucleotide populations into a single surface 
expression vector species, the DNAs isolated from each 
sublibrary are digested an excess of Fok I (New England 
Biolabs). The reactions are stopped by phenol /chloroform 
extraction, followed by ethanol precipitation. Pell ts are 
35 resuspended in dHjO. Each surface expression library is 
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generated by ligating equal molar amounts (5-10 pmol ) of 
Fok I digested DNA isolated from corresponding right and 
left half sublibraries in 10 of IX ligase buffer 

containing 1.0 U of T4 DNA ligase (Bethesda Research 
5 Laboratories, Gaithersburg , MD) . The ligations proc ed 
overnight at 16 *C and are electroporated into the sup 0 
strain MK30-3 (Boehringer Mannheim Biochemical, (BMB) , 
Indianapolis, IN) as previously described for XLl cells. 
Because MK30-3 is sup O, only the vector portions encoding 
10 the randomized oligonucleotides which come together will 
produce viable phage. 

Screening of Surface Expression Libraries 

Purified phage are prepared from 50 ml liquid 
cultures of XLl Blue™ cells (Stratagene) which are infected 

15 at a m.o.i. of 10 from the phage stocks stored at 4*C. Th 
cultures are induced with 2 mM IPTG. Supernatant s from all 
cultures are combined and cleared by two centrif ugations , 
and the phage are precipitated by adding 1/7.5 volumes of 
PEG solution (25% PEG-BGOO, 2.5 M NaCl), followed by 

20 incubation at 4^C overnight. The precipitate is recovered 
by centrif ugation for 90 minutes at 10,000 x g. Phage 
pellets are resuspended in 25 ml of 0.01 M Tris-HCl, pH 
7.6, 1.0 mM EDTA, and 0.1% Sarkosyl and then shaken slowly 
at room temperature for 30 minutes. The solutions are 

25 adjusted to 0.5 M NaCl and to a final concentration of 5% 
polyethylene glycol. After 2 hours at 4^C, the 

precipitates containing the phage are recovered by 
centrif ugation for 1 hour at 15,000 X g. The precipitates 
are resuspended in 10 ml of NET buffer (0.1 M NaCl, 1.0 mM 

30 EDTA, and 0.01 M Tris-HCl, pH 7.6), mixed well, and the 
phage repelleted by centrif ugation at 170,000 X g for 3 
hours. The phage pellets are subsequently resuspended 
overnight in 2 ml of NET buffer and subjected to cesium 
chloride centrif ugation for 18 hours at 110,000 X g (3.86 

35 g of cesium chloride in 10 ml of buffer) . Phage bands are 
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collect d, diluted 7-fold with NET buffer, recentrif uged at 
170,000 X g for 3 hours, resuspended, and stored at in 
0.3 ml of NET buffer containing 0.1 mM sodium azide, 

Ligand binding proteins used for panning on 
5 streptavidin coated dishes are first biotinylated and th n 
absorbed against UV-inactivated blocking phage (see below) • 
The biotinylating reagents are dissolved in 
dimethylf ormamide at a ratio of 2.4 mg solid NHS-SS-Biotin 
( sulf osuccinimidyl 2- ( biotinamido )ethyl-l,3'- 
10 dithiopropionate; Pierce, Rockford, IL) to 1 ml solvent and 
used as recommended by the manufacturer. Small-scale 
reactions are accomplished by mixing 1 fil dissolved reagent 
with 43 yl of 1 mg/ml ligand binding protein diluted in 
sterile bicarbonate buffer (0.1 M NaHCOj, pH 8.6). After 2 
15 hours at 25*^0, residual biotinylating reagent is reacted 
with 500 ^1 1 M ethanolamine (pH adjusted to 9 with HCl) 
for an additional 2 hours. The entire sample is diluted 
with 1 ml TBS containing 1 mg/ml BSA, concentrated to about 

50 pi on a Centricon 30 ultra- filter (Amicon)~; and washed 

on the same filter three times with 2 ml TBS and once with 
1 ml TBS containing 0.02% NaNs and 7 x 10" UV-inactivated 
blocking phage (see below); the final retentate (60-80 jil) 
is stored at 4*^C. Ligand binding proteins biotinylated 
with the NHS-SS-Biotin reagent are linked to biotin via a 
disulf ide-containing chain . 

UV-irradiated M13 phage were used for blocking 
binding proteins which fortuitously bound f ilfimentous phage 
in general. M13mp8 (Messing and Vieira, Gene 19: 262-27 6 
(1982), which is incorporated herein by reference) was 
chosen because it carries two amber stop codons, which 
ensure that the few phage surviving irradiation will not 
grow in the sup O strains used to titer the surface 
expression libraries. A 5 ml sample containing 5 x 10" 
M13mp8 phage, purified as described above, was placed in a 
small petri plate and irradiated with a germicidal lamp at 
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a distance of two feet for 7 minutes (flux 150 /jW/cm^). 
NaNa was added to 0.02% and phage particles concentrated to 
10" particles/ml on a Centricon 30-kDa ultrafilter 
(Amicon) • 

5 For panning, polystyrene petri plates (60 x 15 

mm. Falcon; Becton Dickinson, Lincoln Park, NJ) are 
incubated with 1 ml of 1 mg/ml of streptavidin (BMB) in 0.1 
M NaHCOj pH 8.6-0.02% NaNa in a small, air-tight plastic box 
overnight in a cold room. The next day streptavidin is 

10 removed and replaced with at least 10 ml blocking solution 
(29 mg/ml of BSA; 3 pg/ml of streptavidin; 0.1 M NaHCOj pH 
8.6-0.02% NaNa) and incubated at least 1 hour at room 
temperature. The blocking solution is removed and plates 
are washed rapidly three times with Trie buffered saline 

15 containing 0.5% Tween 20 (TBS-0.5% Tween 20). 

Selection of phage expressing peptides bound by 
the ligand binding proteins is performed with 5 pil (2.7 pg 
ligand binding protein) of blocked biotinylated ligand 
binding proteins reacted with a 50 jul portion of each 

20 library. Each mixture is incubated overnight at 4*^C, 
diluted with 1 ml TBS-0.5% Tween 20, and transferred to a 
streptavidin-coated petri plate prepared as described 
above. After rocking 10 minutes at room temperature, 
unbound phage are removed and plates washed ten times with 

25 TBS-0.5% Tween 20 over a period of 30-90 minutes. Bound 
phage are eluted from plates with 800 ^1 sterile elution 
buffer (1 mg/ml BSA, 0.1 M HCl, pH adjusted to 2.2 with 
glycerol) for 15 minutes and eluates neutralized with 48 ^1 
2 M Tris (pH unadjusted) . A 20 ^«1 portion of each eluate 

30 is titered on MK30-3 concentrated cells with dilutions of 
input phage. 

A second round of panning is performed by 
treating 750 pi of first eluate from each library with 5 mM 
DTT for 10 minutes to break disulfide bonds linking biotin 
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groups to residual biotinylated binding proteins. The 
treated eluate is concentrat d on a Centricon 30 
ultrafilter (Amicon) ^ washed three times with TBS-0,5% 
Tween 20, and concentrated to a final volume of about 50 
5 ^1. Final retentate is transferred to a tube containing 
5.0 ^1 (2.7 ;ig ligand binding protein) blocked biotinylat d 
ligand binding proteins and incubated overnight. The 
solution is diluted with 1 ml TBS-0.5% Tween 20, panned, 
and eluted as described above on fresh streptavidin-coat d 
10 petri plates. The entire second eluate (800 ^1) is 
neutralized with 48 fil 2 M Tris, and 20 jul is titered 
simultaneously with the first eluate and dilutions of the 
input phage. 

Individual phage populations are purified through 

15 2 to 3 rounds of plague purification. Briefly, the second 
eluate titer plates are lifted with nitrocellulose filters 
(Schleicher & Schuell, Inc., Keene, NH) and processed by 
washing for 15 minutes in TBS (10 mM Tris-HCl, pH 7.2, 150 
mM NaCl), followed by an incubation with shaking for- an ~ 

20 additional 1 hour at 37 with TBS containing 5% nonfat dry 
milk (TBS-5% NDM) at 0.5 ml/cm^. The wash is discarded and 
fresh TBS-5% NDM is added (0.1 ml/cm^) containing the ligand 
binding protein between 1 nM to 100 mM, preferably between 
1 to 100 fuH* All incubations are carried out in heat- 

25 sealable pouches (Sears). Incubation with the ligand 
binding protein proceeds for 12-16 hours at 4*C with 
shaking. The filters are removed from the bags and washed 
3 times for 30 minutes at room temperature with 150 blIs of 
TBS containing 0.1% NDM and 0.2% NP-40 (Sigma, St. Louis, 

30 MO) • The filters are then incubated for 2 hours at room 
temperature in antiserxim against the ligand binding protein 
at an appropriate dilution in TBS- 0.5% NDM, washed in 3 
changes of TBS containing 0.1% NDM and 0.2% NP-40 as 
described above and incubat d in TBS containing 0.1% NDM 

35 and 0.2% NP-40 with 1 x 10^ cpm of "^I-labeled Protein A 
(specific activity = 2.1 x 10"^ cpm/jjg) . After a washing 
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with TBS containing 0.1% NDM and 0.2% NP-40 as described 
above, the filt rs are wrapped in Saran Wrap and exposed to 
Kodak X-Omat x-ray film (Kodak, Rochester, NY) for 1-12 
hours at -70^C using Dupont Cronex Lightning Plus 
5 Intensifying Screens (Dupont, Willmington, DE). 

Positive plaques identified are cored with the 
large end of a pasteur pipet and placed into 1 ml of SM 
(5.8 g NaCl, 2 g MgSO^-THjO, 50 ml 1 M Tris-HCl, pH 7.5, 5 
mis 2% gelatin, to 1000 mis with dHsO) plus 1-3 drops of 

10 CHCI3 and incubated at 37 *C 2-3 hours or overnight at 4^*0. 
The phage are diluted 1:500 in SM and 2 /j1 are added to 300 
^1 of XLl cells plus 3 mis of soft agar per 100 mm^ plate. 
The XLl cells are prepared for plating by growing a colony 
overnight in 10 ml LB (10 g bacto-tryptone , 5 g bacto-yeast 

15 extract, 10 g NaCl, 1000 ml dHsO) containing 100 fil of 20% 
maltose and 100 pi of 1 H MgS04. The bacteria are pelletted 
by centrifugation at 2000 xg for 10 minutes and the pellet 
is resuspended gently in 10 mis of 10 mM MgS04. The 
suspension is diluted 4-fold by adding 30 mis of 10 mM MgS04 

20 to give ah ODgoo of approximately 0.5. The second and third 
round screens are identical to that described above except 
that the plagues are cored with the small end of a pasteur 
pipet and placed into 0.5 mis SM plus a drop of CHCI3 and 1- 
5 ful of the phage following incubation are used for plating 

25 without dilution. At the end of the third round of 
purification, an individual plaque is picked and the 
templates prepared for sequencing. 

Template Preparation and Sequencing 

Templates are prepared for sequencing by 
30 inoculating a 1 ml culture of 2XYT containing a 1:100 
dilution of an overnight culture of XLl with an individual 
plaque. Th plaques are pick d using a sterile toothpick. 
The culture is incubated at 37**C for 5-6 hours with shaking 
and then transferred to a 1.5 ml microfuge tube. 200 pi of 
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PEG solution is added, followed by vortexing and placed on 
ice for 10 minutes. The phage precipitate is recovered by 
centrif ugation in a microfuge at 12,000 x g for 5 minut s. 
The supernatant is discarded and the pellet is resuspended 
5 in 230 fxl of TE (10 mM Tris-HCl, pH 7.5, 1 mM EDTA) by 
gently pipeting with a yellow pipet tip. Phenol (200 jil) 
is added, followed by a brief vortex and laicrofuged to 
separate the phases. The aqueous phase is transferred to 
a separate tube and extracted with 200 ^1 of 

10 phenol /chloroform (1:1) as described fidDOve for the phenol 
extraction. A 0.1 volume of 3 M NaOAc is added, follow d 
by addition of 2.5 volumes of ethanol and precipitated at 
-20**C for 20 minutes. The precipitated templates are 
recovered by centrif ugation in a microfuge at 12,000 x g 

15 for 8 minutes. The pellet is washed in 70% ethanol, dried 
and resuspended in 25 ;j1 TE. Sequencing was performed 
using a Sequenase^^ sequencing kit following the protocol 
supplied by the manufacturer (U.S. Biochemical, Cleveland, 
OH) . _ 



20 EXAMPLE II 

Isolation and Characterization of Peptide Lioands Generated 
From Oligonucleotides Having Random Codons at Two 
Predetermined Positions 

This example shows the generation of a surface 
25 expression library from a population of oligonucleotides 
having randomized codons. The oligonucleotides are t n 
codons in length and are cloned into a single vector 
species for the generation of a M13 gene Vlll-based surface 
expression library. The example also shows the selection 
30 of peptides for a ligand binding protein and 
characterization of their encoded nucleic acid sequences. 

Oligonucleotide Synthesis 
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Oligonucleotides were synthesized as described in 
Exampl I. The synthesizer was programmed to synthesize 
the sequences shown in Table IX. These sequences 
correspond to the first random codon position synthesized 
and 3 ' flanking sequences of the oligonucleotide which 
hybridizes to the leader sequence in the vector. The 
complementary sequences are used for insertional 
mutagenesis of the synthesized population of 
oligonucleotides . 



10 



Table IX 



15 



20 



Column 




column 


1 


column 


2 


column 


3 


column 


4 


column 


5 


column 


6 


column 


7 


column 


8 


column 


9 


column 


10 



Sequence (5^ to 3M 
AA ( A/ C ) GGTTGGTCGGTACCGG 
AG ( A/G ) GGTTGGTCGGTACCGG 
AT ( A/G ) GGTTGGTCGGTACCGG 
AC ( A/G ) GGTTGGTCGGTACCGG 
CA ( G/T ) GGTTGGTCGGTACCGG 
CT ( G/C ) GGTTGGTCGGTACCGG 
AG ( T/C ) GGTTGGTCGGTACCGG 
AT { T/C ) GGTTGGTCGGTACCGG 
CC ( A/C ) GGTTGGTCGGTACCGG 
T ( A/T ) TGGTTGGTCGGTACCGG 



The next eight random codon positions were 
synthesized as described for Table V in Example I. 
Following the ninth position synthesis, the reaction 
25 products were once more combined, mixed and redistributed 
into 10 new reaction columns. Synthesis of the last random 
codon position and 5' flanking sequences are shown in Table 
X. 

Table X 



30 



Column 
column 1 
column 2 
column 3 



Sequence f 5 ^ to 3 ^ ^ 
AGGATCCGCCGAGCTCAA ( A/C ) A 
AGGATCCGCCGAGCTCAG ( A/G ) A 
AGGATCCGCCGAGCTCAT ( A/G ) A 
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colijunn 4 

column 5 

column 6 

column 7 

column 8 

column 9 

colximn 10 



AGGATCCGCCGAGCTCAC ( A/G ) A 
AGGATCCGCCGAGCTCCA ( G /T ) A 
AGGATCCGCCGAGCTCCT ( G/C ) A 
AGGATCCGCCGAGCTCAG ( T/C ) A 
AGGATCCGCCGAGCTCAT ( T/C ) A 
AGGATCCGCCGAGCTCCC ( A/C ) A 
AGGATCCGCCGAGCTCT(A/T) TA 



The reaction products were mixed once more and 
the oligonucleotides cleaved and purified as recommended by 
10 the manufacturer. The purified population of 

oligonucleotides were used to generate a surface expression 
library as described below. 



Vector Construction 



The vector used for generating surface expression 
15 libraries from a single oligonucleotide population (i.e., 
without joining together _ of right and left half 
oligonucleotides) is described below. The vector is a Mi3- 
based expression vector which directs the synthesis of gene 
Vlll-peptide fusion proteins (Figure 4 ) . This vector 
20 exhibits all the functions that the combined right and left 
half vectors of Excunple I exhibit. 

An Ml3-based vector was constructed for the 
cloning and surface expression of populations of random 
oligonucleotides (Figure A, M13IX30), M13mpl9 (Pharmacia) 

25 was the starting vector. This vector was modified to 
contain, in addition to the encoded wild type Ml 3 gene 
VIII: (1) a pseudo-wild type gene, gene VIII sequence with 
an amber stop codon placed between it and the restriction 
sites for cloning oligonucleotides; (2) Stu I, Spe I and 

30 Xho I restriction sites in frame with the pseudo-wild type 
gVIII for cloning oligonucleotides; (3) sequences n cessary 
for xpression, such as a promoter, signal sequence and 
translation initiation signals; (4) various other mutations 
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to remove redundant restriction sites and the sunino 
terxainal portion of Lac Z* 

Construction of M13IX30 was performed in four 
steps. In the first step, a precursor vector containing 
5 the pseudo gene VIII and various other mutations was 
constructed. Ml 31X0 IF. The second step involved the 
construction of a small cloning site in a separate M13mpl8 
vector to yield M13IX03. In the third step, expression 
sequences and cloning sites were constructed in M13IX03 to 
10 generate the intermediate vector M13IX04B. The fourth step 
involved the incorporation of the newly construct d 
sequences from the intermediate vector into Ml 31X0 IF to 
yield M13IX30. Incorporation of these sequences linked 
them with the pseudo gene VIII. 

15 Construction of the precursor vector M13IX01F was 

similar to that of Ml 3 1X4 2 described in Example I except 
for the following features: (1) M13mpl9 was used as th 
starting vector; (2) the Fok I site 5' to the unique Eco 
RI site was not incorporated and the overhang at the 

20 naturally occurring Fok I site at position 3547 was not 
changed to 5'-CTTC-3'; (3) the spacer sequence was not 
incorporated between the Eco RI and Sac I sites; and (4) 
the amber codon at poisition 4492 was not incorporated. 

In the second step, M13mpl8 was mutated to remove 
25 the 5' end bf Lac Z up to the Lac i binding site and 
including the Lac Z ribosome binding site and start codon. 
Additionally, the polylinker was removed and a Mlu I site 
was introduced in the coding region of Lac Z. A single 
oligonucleotide was used for these mutagenesis and had the 
3 0 sequence " 5 ' -AAACGACGGCCAGTGCCAAGTGACGCGTGTGAAATTGTTATCC- 
3'" (SEQ ID NO: 41). Restriction enzyme sit s for Hind III 
and Eco RI w re introduc d downstr am of the Mlul site 
using the oligonucleotide "5/- 
GGCGAAAGGGAATTCTGCAAGGCGATTAAGCTTGGGTAACGCC-3 ' " (SEQ ID NO: 
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42). These modifications of Ml3mpl8 yielded the vector 
M13IX03. 

The expression sequences and cloning sites were 
introduced into Ml 3 1X03 by chemically synthesizing a seri s 
5 of oligonucleotides which encode both strands of the 
desired sequence. The oligonucleotides are presented in 
Table XI (SEQ ID NOS: 43 through 50). 

TABLE XI 
M13IX30 Oligonucleotide Series 



10 Top Strand 

Oligonucleotides 

084 

027 



15 



028 
029 



Sequence (5' to 3 ^ ) 

GGCGTTACCCAAGCTTTGTACATGGAGAAAATAAAG 

TGAAACAAAGCACTATTGCACTGGCACTCTTACCGT 
TACCGT 

TACTGTTTACCCCTGTGACAAAAGCCGCCCAGGTCC 
AGCTGC 

TCGAGTCAGGCCTATTGTGCCCAGGGATTGTACTAG 
TGGATCCG 



Bottom 
20 Ol i gonuc 1 e ot ide s 

085 

031 



25 



032 
033 



Sequence (5* to 3 ) 

TGGCGAAAGGGAATTC6GATCCACTAGTACAATCCCTG 

GGCACAATAGGCCTGACTCGAGCAGCTGGACCAGGGCG 
GCTT 

TTGTCACAGGGGTAAACAGTAACGGTAACGGTAAGTGT 
GCCA 

GTGCAATAGTGCTTTGTTTCACTTTATTTTCTCCATGT 
ACAA 



The above oligonucleotides except for the 
terminal oligonucleotides 084 (SEQ ID NO: 43) and 085 (SEQ 
30 ID NO: 47) of Table XI were mixed, phosphorylat d, annealed 
and ligated to form a double stranded insert as described 
in Example I. However, instead of cloning directly into 
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the intermediate vector the insert was first amplified by 
PGR using th terminal oligonucl otides 084 (SEQ ID NO: 43) 
and 085 (SEQ ID NO: 47 ) as primers. The terminal 
oligonucleotide 084 (SEQ ID NO: 43) contains a Hind III 
5 site 10 nucleotides internal to its 5' end. 
Oligonucleotide 085 (SEQ ID NO: 47) has an Eco RI site at 
its 5' end. Following amplification, the products wer 
restricted with Hind III and Eco RI and ligated as 
described in Example I into the poly linker of M13ihpl8 

10 digested with the same two enzymes. The resultant double 
stranded insert contained a ribosome binding site, a 
translation initiation codon followed by a leader sequence 
and three restriction enzyme sites for cloning random 
oligonucleotides (Xho I, Stu I, Spe I). The vector was 

15 named M13IX04- 

During cloning of the double- stranded insert, it 
was found that one of the GCC codons in oligonucleotides 
028 and its complement in 031 was deleted. Since this 
deletion did not affect function, the final construct is 

20 missing one of the two GCC codons. Additionally, 
oligonucleotide 032 contained a GTG codon where a GAG codon 
was needed. Mutagenesis was performed using the 

oligonucleotide 5 '-TAACGGTAAGAGTGCCAGTGC-3 ' (SEQ ID NO: 51) 
to convert the codon to the desired sequence. The 

25 resultant intermediate vector was named M13IX04B. 

The fourth step in constructing M13IX30 involved 
inserting the expression and cloning sequences from 
M13IX04B upstream of the pseudo-wild type gVIII in 
M13IX01F. This was accomplished by digesting M13IX04B with 

30 Dra III and Ban HI and gel isolating the 700 base pair 
insert containing the sequences of interest. M13IX01F was 
likewise digested with Dra III and Bam HI. The insert was 
combined with the double digested v ctor at a molar ratio 
of 3:1 and ligated as d scribed in Example I . It should be 

35 noted that all modifications in the v ctors described 
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herein w re confirmed by sequence analysis. The sequence 
of the final construct. Ml 3 1X30, is shown in Figure 7 (SEQ 
ID NO: 3). Figure 4 also shows M13IX30 where each of the 
elements necessary for surface expression of randomized 
5 oligonucleotides is marked. 

Library Construction, Screening and Characterization of 
Encoded Oligonucleotides 

Construction of an M13IX30 surface expression 
library is accomplished identically to that described in 

10 Example I for sublibrary construction except the 
oligonucleotides described above are inserted into M13IX30 
by mutagenesis instead of by ligation. The library is 
constructed and propagated on MK30-3 (BMB) and phage stocks 
are prepared for infection of XLI cells and screening. The 

15 surface expression library is screened and encoding 
oligonucleotides characterized as described in Example I. 



EXAMPLE III 

Isolation and Characterization of Peptide Ligands 
Generated from Right and Left Half 
20 Degenerate Oligonucleotides 

This excunple shows the construction and 
expression of a surface expression library of degenerate 
oligonucleotides. The encoded peptides of this example 
derive from the mixing and joining together of two separate 
25 oligonucleotide populations. Also demonstrated is the 
isolation and characterization of peptide ligands and their 
corresponding nucleotide sequence for specific binding 
proteins . 

Synthesis of Oligonucleotide Populations 
30 A population of left half degenerate 
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oligonucleotides and a population of right half degenerate 
oligonucleotides was synthesized using standard automated 
procedures as described in Example I. 

The degenerate codon sequences for ach 
5 population of oligonucleotides were generated by 
sequentially synthesizing the triplet NNG/T where N is an 
equal mixture of all four nucleotides. The antisense 
sequence for each population of oligonucleotides was 
synthesized and each population contained 5 ' and 3 ' 

10 flanking sequences complementary to the vector sequence. 
The complementary termini was used to incorporate ach 
population of oligonucleotides into their respective 
vectors by standard mutagenesis procedures. Such 
procedures have been described previously in Example I and 

15 in the Detailed Description. Synthesis of the antisense 
sequence of each population was necessary since the single- 
stranded form of the vectors are obtained only as the s nse 
strand. 

The left half oligonucleotide population was 
20 synthesized having the following sequence: 5'- 
AGCTCCCGGATGCCTCAGAAGATG (A/CNN) 5GGCTTTTGCCACAGGGG-3' (SEQ ID 
NO: 52). The right , half oligonucleotide population was 
synthesized having the following sequence: 5'- 
CAGCCTCGGATCCGCC (A/CNN )ioATG(A/C)GAAT-3' (SEQ ID NO. 53). 
25 These two oligonucleotide populations when incorporated 
into their respective vectors and joined together encode a 
20 codon oligonucleotide having 19 degenerate positions and 
an internal predetermined codon sequence. 

Vector Construction 

30 Modified forms of the previously described 

vectors w r used for the construction of right and left 
half sublibraries. The construction of left half 
sublibraries was performed in an M13-based vector termed 
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M13ED03. This vector is a modified form of the previously 
described H13IX30 vector and contains all th essential 
features of both M13IX30 and M13IX22. M13ED03 contains, in 
addition to a wild type and a pseudo-wild type gene VIII, 
5 sequences necessary for expression and two Fok I sites for 
joining with a right half oligonucleotide sublibrary. 
Therefore, this vector combines the advantages of both 
previous vectors in that it can be used for the generation 
and expression of surface expression libraries from a 
10 single oligonucleotide population or it can be joined with 
a sublibrary to bring together right and left half 
oligonucleotide populations into a surface expression 
library. 

M13EO03 was constructed in two steps from 
15 M13IX30. The first step involved the modification of 
Ml 3 1X30 to remove a redundant sequence and to incorporate 
a sequence encoding the eight amino-terminal residues of 
human fl- endorphin > The leader sequence was also mutated to 
increase secretion of the product. 

20 During construction of M13IX04 (an intermediate 

vector to M13IX30 which is described in Example II), a six 
nucleotide sequence was duplicated in oligonucleotide 027 
(SEQ ID NO: 44) and its complement 032 (SEQ ID NO: 49). 
This sequence, 5 '-TTACCG-3 ' , was deleted by mutagenesis in 

25 the construction of M13ED01. The oligonucleotide used for 
the mutagenesis was 5 '-GGTAAACAGTAACGGTAAGAGTGCCAG-3 ' (SEQ 
ID NO: 54). The mutation in the leader sequence was 
generated using the oligonucleotide 5 ' -GGGCTTTTGCCACAGGGGT- 
3' (SEQ ID NO: 55). This mutagenesis resulted in the A 

30 residue at position 6353 of M13IX30 being changed to a G 
residue. The resultant vector was designated M13IX32. 

To g nerate M13ED01, the nucl otide sequence 
encoding J3- ndorphin (8 amino acid residues of B- ndorphin 
plus 3 extra amino acid residues) was incorporated after 
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the leader sequence by mutagenesis. The oligonucleotide 
used had th following sequence: 5'- 

AGGGTCATCGCCTTCAGCTCCGGATCCCTCAGAAGTCATAAACCCCCCATAGGC 
TTTTGCCAC-3' (SEQ ID NO: 56). This mutagenesis also 
5 removed some of the downstream sequences through the Spe I 
site. 

The second step in the construction of M13ED03 
involved vector changes which put the 13-endorphin sequence 
in frame with the downstream pseudo-gene VIII sequence and 

10 incorporated a Fok I site for joining with a sublibrary of 
right half oligonucleotides. This vector was designed to 
incorporate oligonucleotide populations by mutagenesis 
using sequences complementary to those flanking or 
overlapping with the encoded i3-endorphin sequence. The 

15 absence of fl-endorphin expression after mutagenesis can 
therefore be used to measure the mutagenesis frequency. In 
addition to the above vector changes, M13ED03 was also 
modified to contain an amber codon at position 3262 for 
biological selection during joining of right and left half 

20 sublibraries ♦ 

The mutations were incorporated using standard 
mutagenesis procedures as described in Example I. The 
frame shift changes and Fok I site were generated using the 
oligonucleotid e 5'- 
25 TCGCCTTCAGCTCCCGGATGCCTCAGAAGCATGAACCCCCCATAGGC-3' (SEQ ID 
NO: 57). The amber codon was generated using the 
oligonucleotide 5 '-CAATTTTATCCTAAATCTTACCAAC-3 ' (SEQ ID NO: 
58). The full sequence of the resultant vector, M13ED03, 
is provided in Figure 8 (SEQ ID NO: 4). 

30 The construction of right half oligonucleotide 

sublibraries was performed in a modified form of the 
M13IX42 vector. The new vector, M13IX421, is identical to 
Ml 3 1X4 2 except that the amber codon between the Eco Rl-SacI 
cloning site and the pseudo-gene VIII sequence was removed. 
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This change nsures that all expression off of the Lac 2 
promoter produces a peptide-gene VIII fusion protein. 
Removal of the amber codon was performed by mutagenesis 
using the following oligonucleotide: 5'- 
5 GCCTTCAGCCTCGGATCCGCC-3' (SEQ ID NO: 59). The full 
sequence of M13IX421 is shown in Figure 9 (SEQ ID NO: 5). 

Library Construction > Screening and Characterization of 
Encoded Oligonucleotides 

A sublibrary was constructed for each of the 

10 previously described degenerate populations of 
oligonucleotides. The left half population of 

oligonucleotides was incorporated into M13ED03 to generate 
the sublibrary H13ED03.L and the right half population of 
oligonucleotides was incorporated into Ml 3 1X4 21 to generate 

15 the sublibrary M13IX421.R. Each of the oligonucleotide 
populations were incorporated into their respective vectors 
using site-^directed mutagenesis as described in Example I • 
Briefly, the nucleotide sequences flanking the degenerate 
codon sequences were complementary to the vector at the 

20 site of incorporation* The populations of nucleotides were 
hybridized to single-stranded M13ED03 or M13IX421 vectors 
and extended with T4 DNA polymerase to generate a double- 
stranded circular vector. Mutant templates were obtained 
by uridine selection in vivo ^ as described by Kunkel et 

25 al., supra > Each of the vector populations were 

electroporated into host cells and propagated as described 
in Example I • 

The random joining of right and left half 
sublibraries into a single surface expression library was 
30 accomplished as described in Example I except that prior to 
digesting each vector population with Fok I they were first 
digested with an enzyme that cuts in the unwanted portion 
of each vector, Bri fly, M13ED03.L was digested with Bgl 
II (cuts at 7094) and M13IX421.R was digested with Hind III 
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(cuts at 3919). Each of the digested populations were 
further treated with alkaline phosphatase to ensure that 
the ends would not religate and then digest d with an 
excess of Fok I. Ligations, electroporation and 
5 propagation of the resultant library was performed as 
described in Example I. 

The surface expression library was screened for 
ligand binding proteins using a modified panning procedure. 

10 Briefly, 1 ml of the library, about 10^^ phage particles, 
was added to 1-5 pg of the ligand binding protein. The 
ligand binding protein was either an antibody or receptor 
globulin (Rg) molecule, Aruffo et al.. Cell 61:1303-1313 
(1990), which is incorporated herein by reference. Phage 

15 were incubated shaking with affinity ligand at room 
temperature for 1 to 3 hours followed by the addition of 
200 ^1 of 1 ^im latex beads (Biosite, San Diego, CA) which 
were coated with goat-antimouse IgG. This mixture was 
incubated shaking for an additional 1-2 hours at room 

20 temperature. Beads were pelleted for 2 minutes by 
centrif ugation in a microfuge and washed with TBS which can 
contain 0.1% Tween 20. Three additional washes were 
performed where the last wash did not contain any Tween 20. 

Beads containing bound phage were added to plates at 
25 a concentration that produces a suitable density for plaque 
identification screening and sequencing of positive clones 
(i.e., plated at confluency for rare clones and 200-500 
plaques /plate if pure plaques were needed) . Briefly, 
plaques grown for about 6 hours at 37**C were overlaid with 
30 nitrocellulose filters that had been soaked in 2 mM IPTG 
and briefly dried. The filters remained on the plaques 
overnight at room temperature, removed and placed in 
blocking solution for 1-2 hours. Following blocking, the 
filters were incubated in 1 pg/ml ligand binding protein in 
35 blocking solution for 1-2 hours at room temperature. Goat 
antimouse Ig-coupled alkaline phosphatase (Fisher) was 
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added at a 1:1000 dilution and the filters were rapidly 
washed with 10 mis of TBS or block solution over a glass 
vacuum filter • Positive plagues w re id ntifi d after 
alkaline phosphatase development for detection. 

5 Alternatively, the bound phage were eluted from the 

beads using 200 ful 0.1 M Glycine-HCl, pH 2.2, for 15 
minutes and the beads were removed by centrif ugation. The 
supernatant containing phage (eluate) was removed and phage 
exhibiting binding to the ligand binding protein were 

10 further enriched by one to two more cycles of panning. The 
eluates were screened by plaque formation, as described 
above. Typical yields after the first eluate were about 1 
X 10* - 5 X 10^ pfu. The second and third eluate generally 
yielded about 5 x 10* - 2 x 10^ pfu and 5 x 10' - 1 x 10^^ 

15 pfu, respectively. 

Screening of the degenerate oligonucleotide 
library with several different ligand binding proteins 
resulted in the identification of peptide sequences which 
bound to each of the ligands. For example, screening with 

20 an antibody to fl-endorphin resulted in the detection of 
about 30-40 different clones which essentially all had the 
core amino acid sequence known to interact with the 
antibody. The sequences flanking the core sequences were 
different showing that they were independently derived and 

25 not duplicates of the same clone. Screening with an 
antibody known as 57 gave similar results (i.e., a core 
consensus sequence was identified but the flanking 
sequences among the clones were different) . 

EXAMPLE IV 



30 Generation of a Left Half Random Oligonucleotide Library 

This example shows the synthesis and construction 
of a left half random oligonucleotide library. 
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A population of random oligonucleotides nine 
codons in length was synthesized as described in Example I 
except that diff rent sequences at their 5' and 3' ends 
were synthesized so that they could be easily inserted into 
5 the vector by mutagenesis. Also, the mixing and dividing 
steps for generating random distributions of reaction 
products was performed by the alternative method of 
dispensing equal volumes of bead suspensions. The liquid 
chosen that was dense enough for the beads to remain 
10 dispersed was 100% acetonitrile. 

Briefly, each colvunn was prepared for the first 
coupling reaction by suspending 22 mg (l^nnole) of 48 ^rmol/g 
capacity beads (Genta, San Diego, CA) in 0.5 mis of 100% 
acetonitrile. These beads are smaller than those described 

15 in Example I and are derivatized with a guanine nucleotide. 
They also do not have a controlled pore size. The bead 
suspension was then transferred to an empty reaction 
column. Suspensions were kept relatively dispersed by 
gently pipetting the suspension during transfer. Columns 

20 were plugged and monomer coupling reactions were performed 
as shown in Table XII • 

Table XII 





Column 




Sequence 
r5' to 3M 


25 


column 


IL 


AA ( A/C ) GGCTTTTGCCACAGG 




column. 


2L 


AG ( A/G ) GGCTTTTGCCACAGG 




column 


3L 


AT {A/G ) GGCTTTTGCCACAGG 




column 


4L 


AC (A/G) GGCTTTTGCCACAGG 




column 


5L 


CA ( G/T ) GGCTTTTGCCACAGG 


30 


column 


6L 


CT { G/C ) GGCTTTTGCCACAGG 




column 


7L 


AG ( T/C ) GGCTTTTGCCACAGG 




column 


8L 


AT ( T/C ) GGCTTTTGCCACAGG 




column 


9L 


CC (A/C ) GGCTTTTGCCACAGG 




column 


lOL 


T ( A/T ) TGGCTTTTGCCACAGG 
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After coupling of the last monomer, the columns 
were unplugged as described previously and their contents 
were poured into a 1,5 ml microfuge tube. The columns w re 
rinsed with 100% acetonitrile to recover any remaining 
5 beads. The volume used for rinsing was determined so that 
the final volume of total bead suspension was about 100 ^1 
for each new reaction column that the beads would be 
aliquoted into. The mixture was vortexed gently to produce 
a uniformly dispersed suspension and then divided, with 

10 constant pipetting of the mixture, into equal volumes. 
Each mixture of beads was then transferred to an empty 
reaction column. The empty tubes were washed with a small 
voliune of 100% acetonitrile and also transferred to their 
respective columns. Random codon positions 2 through 9 

15 were then synthesized as described in Example I where the 
mixing and dividing steps were performed using a suspension 
in 100% acetonitrile. The coupling reactions for codon 
positions 2 through 9 are shown in Table XIII. 

Table XIIX ~ 

20 Sequence 
Column f 5 ^ to 3 M 



25 



30 



colizmn 


IL 


AA(A/C)A 


column 


2L 


AG(A/G)A 


column 


3L 


AT(A/G)A 


column 


4L 


AC(A/G)A 


column 


5L 


CA(G/T)A 


column 


6L 


CT(G/C)A 


column 


7L 


AG(T/C)A 


column 


8L 


AT(T/C)A 


column 


9L 


CC(A/C)A 


column 


lOL 


T{A/T)TA 



After coupling of the last monomer for the ninth 
codon position, the reaction products were mixed and a 
portion was transferred to an empty reaction column. 
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Columns were plugged and the following monomer coupling 
reactions were performed : 5 ' -CGGATGCCTCAGAAGCCCCXXA-3 ' 
. (SEQ ID NO: 60). The resulting population of random 
oligonucleotides was purified and incorporated by 
5 mutagenesis into the left half vector M13ED04. 

M13ED04 is a modified version of the M13ED03 
vector described in Example III and therefore contains all 
the features of that vector. The difference between 
M13ED03 and M13ED04 is that M13ED04 does not contain the 

10 five amino acid sequence (Tyr Gly Gly Phe Met) recogniz d 
by anti-B-endorphin antibody. This sequence was deleted by 
mutagenesis using the oligonucleotide 5'- 
CGGATGCCTCAGAAGGGCTTTTGCCACAGG (SEQ ID NO: 61). The entire 
nucleotide sequence of this vector is shown in Figure 10 

15 (SEQ ID NO: 6) . 

EXAMPLE V 

Generation of Soluble , Conf ormationally-Constrained 

Random Peptides 

This example shows the synthesis and construction 
20 of expressible oligonucleotides encoding soluble peptides 
haying a constrained secondary structure in solution. 

As noted previously, the binding affinity of a 
peptide for a ligand-binding protein is a function of the 
primary and secondary structure of the peptide ♦ The effect 
25 of primary structure on affinity may be determined as 
disclosed in the above examples. 

In its broadest form, the disclosed method 
provides oligonucleotides that are synthesized having a 
desired bias of predetermined codons such that the 
30 oligonucleotides encode peptides having a constrained 
secondary structure in aqueous solution. In a preferred 
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mbodiment ^ oligonucleotides encoding peptides having a 
constrained secondary structure are synthesized having a 
desired bias of predetermined codons such that the 
predetermined codons are separated by at least one random 
5 codon • 

Oligonucleotides having more than one tuplet 
encoding an amino acid capable of forming a covalent bond 
at a predetermined position and the remaining positions 
having random tuplets are synthesized using the methods 

10 described herein. The synthesis steps are similar to those 
outlined above using twenty or less reaction vessels except 
that prior to synthesis of the specified codon position, 
the dividing of the supports into separate reaction vess Is 
for synthesis of different codons is omitted. For example, 

15 if the codon at the second position of the oligonucleotide 
is to be specified, then following synthesis of random 
codons at the first position and mixing of the supports, 
the mixed supports are not divided into new reaction 
vessels but, instead, are contained in a~single reaction 

20 vessel to synthesize the specified codon. The specified 
codon is synthesized sequentially from individual monomers 
as described above. Thus, the number of reaction .vessels 
is increased or decreased at each step to allow for the 
synthesis of a specified codon or a desired number of 

25 random codons. 

Alternatively, a population of random left and 
right precursor oligonucleotides are synthesized 
essentially as described in Example I, except that at least 
one predetermined codon encoding cysteine, lysine, glutamic 

30 acid, leucine or tyrosine is incorporated into each 
oligonucleotide. Combination of right and left 

oligonucleotides results in a single oligonucleotide 
containing at least two predetermined codons . 
Alternatively, a population of random oligonucleotides is 

35 synthesized as described in Example II, except that at 
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least two predetermined codons encoding cysteine, lysine, 
glutamic acid, leucine or tyrosine are incorporated into 
only one of th two precursor oligonucleotide populations. 

Following expression of the oligonucleotides, a 
5 peptide having a constrained secondary structure is 
obtained by allowing the formation of at least one intra- 
peptide covalent bond. One skilled in the art would know 
the conditions necessary to allow formation of the 
particular covalent bond. See, for exaunple. Proteins , 

10 Structures and Molecular Principles , Creighton, T.E. ed. , 
W.H. Freeman and Co., New York (1984), incorporated herein 
by reference. Although oligonucleotides can encode 
peptides capable of forming more than one intra-peptide 
covalent bond, only one such bond is necessary to form a 

15 conf ormationally-constrained peptide. 

The peptide libraries are expressed on th 
surface of a cell, for example, a bacteriophage. Phage 
expressing peptide ligands are initially identified by 
panning, essentially as described in Example I, except that 
20 the phage are first incubated in the presence of a ligand- 
binding protein (in this example, an antibody), then panned 
in protein A-coated dishes. Individual phage populations 
are purified through three rounds of plague purification, 
essentially as described in Example I. 

25 Two phage encoding peptides showing significantly 

higher ligand binding affinity than the general phage 
population are isolated, the oligonucleotide sequences are 
determined and the amino acid sequences deduced. The 
ligand binds with highest affinity to a twenty- two amino 

30 acid peptide having the sequence TQSKCSTDHWLGYIEYFIMCTY 
(SEQ. ID. NO.: 62). The ligand also binds with high 
affinity to a peptide having the sequence 
CDDQYYTDHEQGKCEVALYYTG (SEQ. ID. NO.: 63). 
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The above-identified peptides are each capable of 
forming several intra-peptide covalent bonds. For example, 
a disulfide bond may form between two cysteine residues, a 
e (y-glutcimyl ) -lysine bond may form between lysine and 
5 glutamic acid residues, a lysinonorleucine bond may form 
between lysine and leucine residues or a dityrosine bond 
can form between two tyrosine residues (Devlin, Textbook of 
Biochemistry 3d ed. (1992)). In addition, other peptides 
can be constructed that contain, for example, four lysine 
10 residues, which can form the heterocyclic structure of 
desmosine. 



The nature of the covalent bond in the peptide 
having the sequence TQSKCSTDHWLGYIEYFIMCTY (SEQ- ID- NO.: 
62) is detenained by examining the effect of amino acid 
15 substitutions on the binding affinity of the ligand, by 
methods known to those skilled in the art, and described 
herein. Creighton, supra , pp. 335-396, incorporated herein 
by reference. 

The oligonucleotide encoding this peptide is 
20 cloned into a vector that allowed secretion of the 
expressed peptide. The peptide TQSKCSTDHWLGYIEyFIMCTY 
(SEQ. ID. NO.: 62) is soluble at a concentration of 4 
mg/ml. The same peptide, except containing the 

substitution of alanine for cysteine is insoluble at this 
25 concentration. 



EXAMPLE VI 

Binding Studies Using Conf ormationallv Constrained 

Peptides 

The association constant (K^) / dissociation 
30 constant (K^) and affinity constant (K) were determined for 
the r action of a monoclonal antibody with the linear or 
the cyclized form of a peptide, using a BIAcore automated 
biosensor (Pharmacia Biosensor AB, Uppsala, Sweden) , as 



BNSDOCID: <W0 9411496A1_L> 



wo 94/11496 



PCr/US93/10850 



74 

d scribed by Karlsson et al., J. Immunol. Meth. 145:229-240 
(1991)- A 24 amino acid peptide , TOSKCSTDEWLGYIEYFIMCTYRR 
(SEQ. ID. NO.: 64), which is recognized by the J2B9 

monoclonal antibody, was used for these experiments. The 
5 peptide contains two cysteine residues that form a 

disulfide bond under oxidizing conditions. 

The cyclized form of the peptide was immobilized 
by its eonino terminus to the BIAcore sensor chip and 
exposed to 0.016, 0.033, 0.066, 0.13 or 2.3 nM solutions of 
10 the J2B9 antibody. Changes in refractive index were 
measured and the formulas described by Karlsson et al., 
supra, were used to calculate the following rate and 
affinity constants: = 3.7 x 10= M-^s'"; = 4.5 x 10"-* 

sec-' and K = 8.4 x 10® M. 

15 After the above-described measurements were 

obtained, the disulfide bond was reduced by treating the 
cyclized peptide with 10 mM dithiothreitol, while the 
peptide was still attached to the BIAcore sensor chip. The 
dissociation rate of the linear peptide and the J2B9 

20 monoclonal antibody was then determined, as described 
above . 

The dissociation rate of the J2B9 antibody and 
the linear peptide was calculated to be 1.54 x 10'^ sec. 
Thus, the antibody dissociated from the linear peptide 

25 three times faster than it dissociated from the cyclized 
peptide. Reoxidation of the linearized peptide to reform 
the cyclized peptide resulted in the dissociation rate 
again decreasing to the 10** range. These results show that 
a conf ormationally constrained peptide binds a specific 

30 receptor with greater affinity than a peptide with a less 
stable secondary structure. 
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EXAMPLE VII 

Soluble, Conf onnationallv-Constrained Random Peptides 
Having High Affinity to An Anti-Tetarius Toxin Antibody 

This example shows the synthesis and construction of 
5 expressible random oligonucleotides encoding soluble 
peptides with constrained secondary structures and the 
selection of high affinity binders to an anti-tetanus 
toxin antibody. 

Oligonucleotide Synthesis 

10 Random oligonucleotides of ten codons in length were 

synthesized as right and left half precursors essentially 
as described in Example I. When combined^ they yield an 
oligonucleotide coding for twenty amino acid long random 
peptides. Codons for cysteine were used to produce 
. 15 pep^tides with a potential for forming covalent bonds for 
secondary structure constraints. In contrast to" that 
described in Exaimple V where the amino acids used for 
cyclization of the peptides were placed at predetermined 
positions, the cysteine codons were introduced at all 

20 positions with a predetermined bias compared to the other 
nineteen random codons. 

Briefly, ten reaction vessels were used for the 
synthesis of twenty random codons at each codon position 
essentially as described in Example I. In addition to 

25 the normal ten reaction vessels used for synthesis, an 

extra two reaction vessels were used for the synthesis of 
the two cysteine codons, TGC and TGT. Thus, the 
synthesis procedure used a total of twelve reaction 
vessels for the synthesis of each codon position where 

30 the fr qu ncy of cyst ine codons at each position is 

twenty percent. The 5' and 3' flanking sequences for the 
right and left half oligonucleotides were those described 
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in Example I. The use of the extra two vessels encoding 
cysteine residues results in the increased frequency of 
cysteine being incorporated at each codon position. This 
increased frequency insures the presence of residues 
5 capable of forming covalent bonds for constraining the 
peptide's secondary structure. Moreover, the random 
incorporation of cysteines at each of the codon 
positions, instead of incorporation at predetermined 
positions, increases the probability of obtaining 
10 peptides with a constrained conformation and, thus, a 
high affinity toward a binding protein since a greater 
number of peptides are available to screen « 

Library Construction and Screening 

Library construction from right and left half 
15 oligonucleotides were generated as described in Example 
I. The libraries were screened for peptides that bind to 
an anti-tetanus toxin antibody essentially as described 
in Example III. After two rounds of panning, eight phage 
clones were selected that showed high affinity binding to 
20 the antibody. Sequencing of the encoding nucleic acids 
revealed seven peptides having cysteines spaced at ten 
residues apart and one peptide having cysteines were 
seven residues apart. The sequences are shown in Table 
XIV and are listed in the sequencing listing as SEQ ID 
25 NOS: 65 through 72. 
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Table XIV 

Conf onnationally Constrained Peptides Having High 
Affinity for Anti-Tetanus Toxin Antibody 

SEP ID NO: PEPTIDE SEQUENCE 



65 


TCLREEFILQCYIVMIEDWY 


66 


ICEHHQMLLQCSLVCEECMM 


67 


KCI IGWYTLTCYMSDRPRME 


68 


ACTQDMNWITCPMYCEVLCF 


69 


VCFYFPFKMMCHMEyiAYEY 


70 


DANCGHCTYMCICKIMYYIS 


71 


WHRHVS SPMSCWWYDQCAVA 


72 


CVQIDFFTVQCNISSHMFLP 



Although the invention has been described with 
reference to the presently preferred embodiment, it 
15 should be understood that various modifications can be 
made without departing from the spirit of the invention. 
Accordingly, the invention is limited only by the claims 
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(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NXJMBER: 

(B) FILING DATE: lO-NOV-1993 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 07/978,893 

(B) FILING DATE: lO-NOV-1992 

(Viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Konski, Antoinette F. 

(B) REGISTRATION NUMBER: 34,202 

(C) REFERENCE /DOCKET NUMBER: FP-IX 9769 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (619) 535-9001 

(E) TELEFAX: <619) 535-8949 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 
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770GAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAOGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


48C 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GomrrTATc 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


AaPGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGICATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGiAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 


ATTCACCTCG 


AAAGCAAGCT 


GATAAACCGA 


TACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTC CTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAAGT 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1680 


TTTACTAACG 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


ACGCTAACTA 


TGAGGGTTGT 


1740 


CTGTGGAATG 


CTACAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


AAACTCAGTG 


TTACGGTACA 


1800 


TGGGTTCCTA 


TTGGGCTTGC 


TATCCCTGAA 


AATGAGGGTG 


GTGGCTCTGA 


GGGTGGCGGT 


1860 


TCTGAGGGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


ATTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CAGAATAATA 


GGTTCCGAAA 


TAGGCAGGGG 


GCATTAACTG 


TTTATACGGG 


CACTGTTACT 


2100 


CAAGGCACTG 


ACCCCGTTAA 


AACTTATTAC 


CAGTACACTC 


CTGTATCATC 


AAAAGCCATG 


2160 


TATGACGCTT 


ACTGGAACGG 


TAAATTCAGA 


GACTGCGCTT 


TCCATTCTGG 


CTTTAATGAA 


2220 


GATCCATTCG 


TTTGTGAATA 


TCAAGGCCAA 


TCGTCTGACC 


TGCCTCAACC 


TCCTGTCAAT 


2280 


GCTGGCGGCG 


GCTCTGGTGG 


TGGTTCTGGT 


GGCGGCTCTG 


AGGGTGGTGG 


CTCTGAGGGT 


2340 
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GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580. 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 27 00* 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 

TTCCCTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATOTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 

CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 

AAATTAGGAT GGGATATTAT CTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 366 0 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140_ 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGXAATTA ATTTTGTTTT CTTGATGTTT 426 0 

GTTTCATCAT CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTCGCCTCT GCGCGATTTT 4320 

GTAACTTGGT ATTCAAAGCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 4380 
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ACTGTTACTG 


TATATTCATC 


TGACGTTAAA 


CCTGAAAATC 


TACGCAATTT 


CTTTATTTCT 


444U 


GTTTTACGTG 


CTAATAATTT 


TGATATGGTT 


GGTTCAATTC 


CTTCCATTAT 


TTAGAAGTAT 


>i c n A 


A&TCCAiiACA 


ATCAGGATTA 


TATTGATGAA 


TTGCCATCAT 


CTGATAATCA 


GGAATATGAT 


4 3 O U 


GATAATTCCG 


CTCCTTCTGG 


TGGTTTCTTT 


GTTCCGCAAA 


ATGATAATGT 


TACTCAAACT 




OTTAAAATTA 


ATAACGTTCG 


GGCAAAGGAT 


TTAATACGAG 


TTGTCGAATT 


GTTTGTAAAG 


>i £ o n 
4o bU 


TCTAAIACTT 


CTAAATCCTC 


AAATGTATTA 


TCTATTGACG 


GCTCTAATCT 


ATTAGTTGTT 


A^ A f\ 

474U 


AGTGCACCTA 


AAGATATTTT 


AGATAACCTT 


CCTCAATTCC 


TTTCTACTGT 


TGATTTGCCA 


4800 


ACTGACCAGA 


TATTGATTGA 


GGGTTTGATA 


TTTGAGGTTC 


AGCAAGGTGA 


TGCTTTAGAT 


4860 


TTTTCATTTG 


CTGCTGGCTC 


TCAGCGTGGC 


ACTGTTGCAG 


GCGGTGTTAA 


TACTGACCGC 


4920 


CTCACCTCTG 


TTTTATCTTC 


TGCTGGTGGT 


TCGTTCGGTA 


TTTTTAATGG 


CGATGTTTTA 


47 oU 


GGGCTATCAG 


TTCGCGCATT 


AAAGACTAAT 


AGCCATTCAA 


AAATATTGTC 


TGTGCCACGT 


dU4U 


ATTCTTACGC 


TTTCAGGTCA 


GAAGGGTTCT 


ATCTCTGTTG 


GCCAGAATGT 


C C C TTTTATT 


3 1 U U 


ACTGGTCGTG 


TGACTGGTGA 


ATCTGCCAAT 


GTAAATAATC 


CATTTCAGAC 


GATTGAGCGT 




CAAAATGTAG 


GTATTTCCAT 


GAGCGTTTTT 


CCTGTTGCAA 


TGGCTGGCGG 


TAATATTGTT 


coon 


CTGGATATTA 


CCAGCAAGGC 


CGATAGTTTG 


AGTTCTTCTA 


CTCAGGCAAG 


TGATGTTATT 


c o o r. 


ACTAATCAAA 


GAAGTATTGC 


TACAACGGTT 


AATTTGCGTG 


ATGGACAGAC 


TCTTTTACTC 


e ^ ii n 


GGTGGCCTCA 


CTGATTATAA 


AAACACTTCT 


CAAGATTCTG 


GCGTACCGTT 


CCTGTCTAAA 


5400 


ATCCCTTTAA 


TCGGCCTCCT 


GTTTAGCTCC 


CGCTCTGATT 


CCAACGAGGA 


AAGCACGTTA 


5460 - 


TACGTGCTCG 


TCAAAGCAAC 


CATAGTACGC 


GCCCTGTAGC 


GGCGCATTAA 


GCGCGGCGGG 


5520 


TGTGGTGGTT 


ACGCGCAGCG 


TGACCGCTAC 


ACTTGCCAGC 


GCCCTAGCGC 


CCGCTCCTTT 


5580 


CGCTTTCTTC 


CCTTCCTTTC 


TCGCCACGTT 


CGCCGGCTTT 


CCCCGTCAAG 


CTCTAAATCG 


5640 


GGGGCTCCCT 


TTAGGGTTCC 


GATTTAGTGC 


TTTACGGCAC 


CTCGACCCCA 


AAAAACTTGA 


5700 


TTTGGGTGAT 


GGTTCACGTA 


GTGGGCCATC 


GCCCTGATAG 


ACGGTTTTTC 


GCCCTTTGAC 


5760 


GTTGGAGTCC 


ACGTTCTTTA 


ATAGTGGACT 


CTTGTTCCAA 


ACTGGAACAA 


CACTCAACCC 


C O O A 


TATCTCGGGC 


TATTCTTTTG 


ATTTATAAGG 


GATTTTGCCG 


ATTTCGGAAC 


CACCATCAAA 


C O O A 

5880 


CAGGATTTTC 


GCCTGCTGGG 


GCAAACCAGC 


GTGGACCGCT 


TGCTGCAACT 


CTCTCAGGGC 


5940 


CAGGCGGTGA 


AGGGCAATCA 


GCTGTTGCCC 


GTCTCGCTGG 


TGAAAAGAAA 


AACCACCCTG 


i A A A 

dOOO 


GCGCCCAATA 


CGCAAACCGC 


CTCTCCCCGC 


GCGTTGGCCG 


ATTCATTAAT 


GCAGCTGGCA 


f A C A 

D Ob U 


CGACAGGTTT 


CCCGACTGGA 


AAGCGGGCAG 


TGAGCGCAAC 


GCAATTAATG 


TGAGT TAG CT 


o 1^ U 


CACTCATTAG 


GCACCCCAGG 


CTTTACACTT 


TATGCTTCCG 


GCTCGTATGT 


TGTGTGGAAT 


6180 


TGTGAGCGGA 


TAACAATTTC 


ACACAGGAAA 


CAGCTATGAC 


CAGGATGTAC 


GAATTCGCAG 


6240 


GTAGGAGAGC 


TCGGCGGATC 


CTAGGCTGAA 


GGCGATGACC 


CTGCTAAGGC 


TGCATTCAAT 


6300 


AGTTTACAGG 


CAAGTGCTAC 


TGAGTACATT 


GGCTACGCTT 


GGGCTATGGT 


AGTAGTTATA 


6360 


GTTGGTGCTA 


CCATAGGGAT 


TAAATTATTC 


AAAAAGTTTA 


CGAGCAAGGC 


TTCTTAACCA 


6420 
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GCTGGCGTAA TAGCGAAGAG GCCCGCACCG ATCGCCCTTC CCAACAGTTG CGCAGCCTGA 648 0 

ATGGCGAATG GCGCTTTGCC TGGTTTCCGG CACCAGAAGC GGTGCCGGAA AGCTGGCTGG 654 0 

AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 6600 

ACGATGCGCC CATCTACACC AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC 6660^ 

CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA AGCTGGCTAC 672 0 

AGGAAGGCCA GACGCGAATT ATTTTTGATG GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 67 80- 

TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATTAACG TTTACAATTT AAATATTTGC 684 0 

TTATACAATC TTCCTGTTTT TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 6900 

CATGCTAGTT TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCTCCAGAC TCTCAGGCAA 6960 

TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACC CTCTCCGGCA TTAATTTATC 7020 

AGCTAGAACG GTTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC 7080 

TTTTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 7140 

AAATTTTTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 7200 

TGTTTTTGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 7260 

TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT 7294 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) S£QX7£NCE CHARACTERISTICS: 

(A) IiENGTE: 7320 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TAl^TGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAG ATTT T 


780 
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TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 9 00 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AlkTATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCX5TGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA" GGGTGGCGGT -1860- 

TCTGAGGGTG GCGGTTCTCA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 246 0 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT. 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 276 0 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2 82 0 
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M A * 


TACTGCGTAA 


TAAGGAGTCT 


TAATCATGCC 


AGTTCTTTTG 


GGTATTCCGT 


2880 


TATTATTGCG 
jfAA A ^^^^^^ 


TTTCCTCGGT 


TTCCTTCTGG 


TAACTTTGTT 


CGGCTATCTG 


CTTACTTTTC 


2940 






AXaVJV*XaX Xs3 


W X AX X X WAX X 


\9 XXX W X X WW A 


CTTATTATTG 


3000 


GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


>^m^^ TV TV tf\n\ tv 

CTGATATTAG 


^^^fP/^TV TVfPniTi 

CGCTCAATTA 


wCw X w 1 \jAw X 


•9 V W U ^ 


TTGTTCAGGG 


TGTTCAGTTA 


ATTCTCCCGT 


CTAATGCGCT 


TCCCTGTTTT 


TATGTTATTC 


312 0 


TCTCTGTAAA 


GGCTGCTATT 


TTCATTTTTG 


ACGTTAAACA 


AAAAATCGTT 


TCTTATTTGG 




ATTGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GtAACTGGCA 


AATTAGGCTC 


TGGAAAGAwG 




CTCGTTAGCG 


fTMTI^^fPfV TV TV 


mm TL n T K fin 


aX XuXaV3WXV9 


V9\3 X \a WAAAA X 




3300 


CTTGATTTAA 




O fr* fft A A 


V X WU^JKM^UW X 


TCGC TAAAAC 


GCCTCGCGTT 


3360 


CXTAGAATAC 




X XwXaXaXWX 


\An XXX X X w 


CTATTGGGCG 


CGGTAATGAT 


3420 


TCCTACGATG 


X TV TV TV fnTV IV TV IV TV 

AAAATAAAAA 


^^^^mm/«/ •1111)1 


\al xwxUvsAlVs 


aV* X u W Vnj X AW 


X X W\9 X X ^ AA X 


3480 


ACCCGTTCTT 


GGAATGATAA 


'IV TV IV ^4 TV ^ IV 

GGAAAGACAG 


C CGATTAT TG 


ATTGGTTTCT 


aWaXuw X WVJX 


W ^ ^ w 


AAATTAGGAT 


GGGATATTAT 


CTTC CTTGTT 


CAGGACTTAT 


CTATTGTTGA 


mTV IV TV 

TAAACAGGwG 


•^fi ft ft 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACAGAAT 


m TV ^ wpfp & ^ o fp 
TAwx X XAW W X 


"^l^i; ft 

J D D U 


TTTGTCGGTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


"^75 ft 


GTTGGCGTTG 


TTAAATATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTGAGCG 


TTGG CTTTAT 


ft ft 


ACTGGTAAGA 


ATTTGTATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


m K « mm TV rw^ iv m 
TAATTATGAT 




TCCGGTGTTT 


ATTCTTATTT 


AACGCCTTAT 


TTATCACACG 


GTCGGTATTT 


^ TV Ti TV TV TV 

CAAACCAX XA 


•a Q ri n 


AATTTAGGTC 


AGAAGATGAA 


ATTAACTAAA 


ATATATTTGA 


AAAAGTTTTC 


TCGCGTTCTT 


"^Q^ ft 


TGTCTTGCGA 


TTGGATTTGC 


ATCAG CATTT 


IV ^Tv ntTV fWTv 

ACATATAGTT 


"TvmTVmTV TV/^^^TV 

ATATAACCCA 


AwCTAAu w wG 


4 ft 


GAGGTTAAAA 


AGGTAGTCTC 


T CAGACCTAT 


GATTTTGATA 


TV Tvmm^TV ^miim 
AATTCACTAT 


TvjAwTwx 1 w X 


4 w O U 


CAGCGTCTTA 


ATCTAAGCTA 


TCGCTATGTT 


nrm/* TV TV TV mm 

TTCAAGGATT 


^m TV TV ^^.^ W TV & 

CTAAGGGAAA 


TVfpmTv KfPfPK 

ATXAATTAAx 


d 1 ^ft 
4 X 4 U 


AGCGACGATT 


TACAGAAGCA 


TV ■IIMII TV fT^m^ TV 

AGGTTATTCA 


^m/^ TV ^ TV mTv mTi 
CT CACATAT A 


TTGATTTATG 


TAwTvjTx^L wC 


^^ftft 


ATTAAAAAAG 


^ m TV T^ mnv^ iv % & 

G TAATTCAAA 


iv TV TV tpm^mm 

TGAAATTva T T 


AAAT6TAATT 


AA± X X Xa3 XXX 


fp/^mfp/; a »p/^ 

X WX X\»aXV9X X 


^ A o u 


TGTTTCATCA 


TCTTCTTTTG 


^rn^ « tv IV tp 

CTCAGGTAAT 


lv & fv & n fp 
TGAAATGAAT 


AATTwQ WW X u 


XwOw^uaX X X 




TGTAACTTGG 


TATXCAAilVvC 


TV * TV rt/^^rt * 


TV m/^^/^fffim TV mm 
aXwv^I^X XaX X 


V X X X W X WWWV7 


& TGT ]V & 21GG 

A X A9 X AAAA\9%J 


4380 


TAG TGTT ACT 


GT AT ATT CAT 


CTGACuTx AA 


» f*f^wT%e* & & & 11 fp 


W X AWUTWAAX X 


X WX X XaX X X W 


^ ^ w 


TGTTTTACGT 


TV IV rn & TV 

G CT AATAATT 


TXGATAT^sVa 1 


fw^ ffT^n ^ R oi^n 
IvT^X ICAAx X 


WW X X WWaX AA 


TT^ H H GT Jl 
X XWaUaaVsXa 


4Rft 0 

nt ^ U w 


TAATCCAAAC 


AATCAGGATT 


TV m TV TV TV 

ATATTGATGA 


TV mm^ ^ ^ Tl m^ TV 

ATTG C CATCA 


fp^ffpo fv m & IV *P^ 
TCTGAT AAT w 


H G/* H ^ T 11 TG H 
aV9waaXAXOa 


d ft n 
4 3 O U 


TGATAATTCC 


GCTCCTTCTG 


GTGGTTTCTT 


TGTTCCGCAA 


AATGATAATG 


TTACTCAAAC 


4620^ 


TTTTAAAATT 


AATAACGTTC 


GGGCAAAGGA 


TTTAATACGA 


GTTGTCGAAT 


TGTTTGTAAA 


4680 


GTCTAATACT 


TCTAAATCCT 


CAAATGTATT 


ATCTATTGAC 


GGCTCTAATC 


TATTAGTTGT 


4740 


TAGTGCACCT 


AAAGATATTT 


TAGATAACCT 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


4800 


AACTGACCAG 


ATATTGATTG 


AGGGTTTGAT 


ATTTGAGGTT 


CAGCAAGGTG 


ATGCTTTAGA 


4860 
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TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 492 0 

CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 49 8 0 

AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 5100 

TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 

TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 534 0 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 

TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 564 0 

GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 57 00 

ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 5820 

CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 

ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTCCTGCAAC TCTCTCAGGG 5940- - 

CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 6000 

GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 606 0 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 

TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 

TTGTGAGCGG ATAACAATTT CACACGCCAA GGAGACAGTC ATAATGAAAT ACCTATTGCC 6240 

TACGGCAGCC GCTGGATTGT TATTACTCGC TGCCCAACCA GCCATGGCCG AGCTCGTGAT 6300 

GACCCAGACT CCAGAATTCC ATCCGGAATG AGTGTTAATT CTAGAACGCG TAAGCTTGGC 6360 

ACTGGCCGTC GTTTTACAAC GTCGTGACTG GGAAAACCCT GGCGTTACCC AACTTAATCG 6420 

CCTTGCAGCA CACCCCCCTT TCGCCAGCTG GCGTAATAGC GAAGAGGCCC GCACCGATCG 6480 

CCCTTCCCAA CAGTTGCGCA GCCTGAATGG CGAATGGCGC TTTGCCTGGT TTCCGGCACC 6540 

AGAAGCGGTG CCGGAAAGCT GGCTGGAGTG CGATCTTCCT GAGGCCGATA CGGTCGTCGT 66 00 

CCCCTCAAAC TGGCAGATGC ACGGTTACGA TGCGCCCATC TACACCAACG TAACCTATCC 666 0 

CATTACGGTC AATCCGCCGT TTGTTCCCAC GGAGAATCCG ACGGGTTGTT ACTCGCTCAC 6720 

ATTTAATGTT GATGAAAGCT GGCTACAGGA AGGCCAGACG CGAATTATTT TTGATGGCGT 6780 

TCCTATTGGT TAAAAAATGA GCTGATTTAA CAAAAATTTA ACGCGAATTT TAACAAAATA 684 0 

TTAACGTTTA CAATTTAAAT ATTTGCTTAT ACAATCTTCC TGTTTTTGGG GCTTTTCTGA 6 9 00 
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rPTATCAACCG 


GGGTACATAT 


GATTGACATG 


CTAGTTTTAC 


GATTACCGTT 


CATCGATTCT 


6960 


CTTGTTTGCT 


CCAGACTCTC 


AGGCAATGAC 


CTGATAGCCT 


TTGTAGATCT 


CTCAAAAATA 


7020 


GCTACCCTCT 


CCGGCATTAA 


TTTATCAGCT 


AGAACGGTTG 


AATATCATAT 


TGATGGTGAT 


7080 


TTGACTGTCT 


CCGGCCTTTC 


TCACCCTTTT 


GAATCTTTAC 


CTACACATTA 


CTCAGGCATT 


7140 


GCATTTAAAA 


TATATGAGGG 


TTCTAAAAAT 


TTTTATCCTT 


GCGTTGAAAT 


AAAGGCTTCT 


7200 


CCCGCAAAAG 


TATTACAGGG 


TCATAATGTT 


TTTGGTACAA 


CCGATTTAGC 


TTTATGCTCT 


7260 


GAGGCTTTAT 


TGCTTAATTT 


TGCTAATTCT 


TTGCCTTGCC 


TGTATGATTT 


ATTGGACGTT 


7320 


(2) INFORMATION FOR SEQ ID NO: 3: 










(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7445 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

( D ) TOPOLOGY : circular 








(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 








AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 




aa ATrtaaaaT 

aaaXVvaaaax 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 




TAAATCTACT 

i flfifi 1 W X^&w X 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 




CCGTACTTTA 

WW w X'^W X X x<\ 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 




CTC TAAGC CA 

W X W X4^X%\JW w\ 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


X A^X W X W XAA 


TCCTGACCTG 

X WW XV9;AWW X w 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


rpfp * a * a ^rtr*fl 


aTaTTTGAaG 

AXAX X X VT AfXw 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


X XvWX X WX\9A 


C*r AT AAT AGT 

W X^VX4W%X4W3X 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


X X X WXuaAI-'X 


ArprpmaaaGPa 

X X X AAAU Wa 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


X AX X\9UtiiWV9C 


X aX w WAU X W X 


^ *■ u 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


r» a zv ars^r"PC 

ChfWWUlU WW X \^ 


XWOWXaX XXX 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


X X saw X w X X AVr 


T ATGC CTCGT 

XXXX X WW X 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


rt*P A TTP P T a A 


ATCTCIAACTG 

AX W X WAAW X w 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


/jmmrnmaTT'aA. 

w XXX X'VX 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


aAAT CG CAT A 

#UU\X WV9W4\X A 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


XAWX awx wox 


•P^ **Prtrt'PG*!*'l"l' 

X W X w\3 X wX X X 


^ W V 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 
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CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 126 0 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 144 0 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 156 0 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACJ^GGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2 040 

CAGAA7AATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAG«CCAA TCdTCTGACC TGCCTCAACC- TCGTGTC^ _2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 246 0 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 282 0 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 306 0 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 318 0 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 324 0 
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CTCGTTAGCG 


TTGGTAAGAT 


TCAGGATAAA 


ATTGTAGCTG 


GGTGCAAAAT 


AGCAACTAAT 


3300 


CTTGATTTAA 


GGCTTCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCCTCGCGTT 


3360 


CTTAGAATAC 


CGGATAAGCC 


TTCTATATCT 


GATTTGCTTG 


CTATTGGGCG 


CGGTAATGAT 


3420 


TCCTACGATG 


AAAATAAAAA 


CGGCTTGCTT 


GTTCTCGATG 


AGTGCGGTAC 


TTGGTTTAAT 


3480_ 


ACCCGTTCTT 


GGAATGATAA 


GGAAAGACAG 


CCGATTATTG 


ATTGGTTTCT 


ACATGCTCGT 


3540 




GGGAT AT TAT 


TTTT C TTG T T 


CAGGACTTAT 


CTATTGTTGA 


TAAACAGGCG 


360a 


CGTTCTC3CAT 


TAGCTGAACA 


TG TTG TTTAT 


TGTCGTCGTC 


TGGACAGAAT 


TACTTTACCT 


3660 




C TTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


3720 


GTTGGCGTTG 


T T AAAT ATGG 


CGATTCTCAA 


TTAAGCCCTA 


CTGTTGAGCG 


TTGGCTTTAT 


3780 


ACTGGTAAGA 


ATTTGTATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTT CT AG 


TAATTATGAT 


3840 




^VX X ^ X X4%X X X 


AACGCCTTAT 


TTATCACACG 


GTCGGTATTT 


CAAACCATTA 


3900 




aSaaa^oaX VmVa 


CCTTTACTAAA 

\9\m X X Aw X AAA 


AT AT ATT TGA 

AX^XXAX X XUA 


AAAAGTTTTC 

<WUM%w X X X X W 


ACGCGTTCTT 


3960 


TGTCTTGCGA 


TTGGATT7GC 


aX Wa\«V.paX X X 


A O A T IT A rSTT* 
aU»aXaXawX X 


ATATlArT'OA 
aXaX aaWW<>-a 


ACCTAAGCCG 

XL w W X AA> w \9 


4020 


^ » ^/ «l|in'| TL TV TV * 


Avv\a X X w X 


X (hiAVxaV*^ X aX 


G A TT T TG A T A 

UAX X X XA3AX A 


A A T*T»f A f*T AT 
XULX X W^^W X AX 


TGACTCTTCT 


4080 




AX W X AAVV^ X A 


X wwWX#^X wX X 


TTC?AAGGATT 

X X V*AA\3\XA X X 


CTAAGGGAAA 

W X AA.\J>J^ JAA A 


ATTAATTAAT 


4140 




TACAGAAGCA 


AGGTTATTCA 


CTCACATATA 


TTGATTTATG 


TACTGTTTCC 


4200 


ATTAAAAAAG 


GTAATTCAAA 


TGAAATTGTT 


AAATGTAATT 


AATTTTGTTT 


TCTTGATGTT 


4260 


TGTTTCATCA 


TCTTCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGCGATTT 


4320 


TGTAACTTGG 


TATTCAAAGC 


AATCAGGCGA 


ATCCGTTATT 


GTTTCTCCCG 


ATGTAAAAGG 


4380 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CTACGCAATT 


TCTTTATTTC 


4440 


X w X X X XXVwV3 X 


GCTAATAATT 


TTGAT ATGG T 


TGGTTCAATT 


CCTTCCATAA 


TTCAGAAGTA 


4500 


X AAX ^l«hAAAV» 


AATCAGGATT 


ATATTGATGA 


ATTG C CATCA 

XVX X VSSrwAX 


TCTGATAATC 


AGGAATATGA 


4560 


TCATAATTCC 


GCTCCTTCTG 

X WW A X W 


G TGGTTT CT T 

w X Ww X X A W A A 


TGTTCCGCAA 


AATGATAATG 


TTACTCAAAC 


4620 


TTTTAAAAT T 

XXX X AAAA^ A 


AATAACGTTC 


GGGCAAAGGA 


TTTAATACGA 


GTTGTCGAAT 


TGTTTGTAAA 


4680 


GTCTAATAC T 


TCTAAATCCT 


CAAATGTATT 


ATCTATTGAC 


GGCTCTAATC 


TATTAGTTGT 


4740 


TAGTGCACCT 


AAAGATATTT 


TAGATAACCT 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


48C0 


AACTGACCAG 


ATATTGATTG 


AGGGTTTGAT 


ATTTGAGGTT 


CAGCAAGGTG 


ATGCTTTAGA 


4860 


TTTTTCATTT 


GCTGCTGGCT 


CTCAGCGTGG 


CACTGTTGCA 


GGCGGTGTTA 


ATACTGACCG 


4920 


CCTCACCTCT 


GTTTTATCTT 


CTGCTGGTGG 


TTCGTTCGGT 


ATTTTTAATG 


GCGATGTTTT 


4980* 


AGGGCTATCA 


GTTCGCGCAT 


TAAAGACTAA 


TAGCCATTCA 


AAAATATTGT 


CTGTGCCACG 


5040 


TATTCTTACG 


CTTTCAGGTC 


AGAAGGGTTC 


TATCTCTGTT 


GGCCAGAATG 


TCCCTTTTAT 


5100 


TACTGGTCGT 


GTGACTGGTG 


AATCTGCCAA 


TGTAAATAAT 


CCATTTCAGA 


CGATTGAGCG 


5160 


TCAAAATGTA 


GGTATTTCCA 


TGAGCGTTTT 


TCCTGTTGCA 


ATGGCTGGCG 


GTAATATTGT 


5220 


TCTGGATATT 


ACCAGCAAGG 


CCGATAGTTT 


GAGTTCTTCT 


ACTCAGGCAA 


GTGATGTTAT 


528C 
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tactaXtcaa AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 546 0 

ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 

TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 5640 

GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 

ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 576 0 

CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC 582 C 

CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 5880 

ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 5940 

CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 6000 

GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 6 060 

ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 6120 

TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 6180 

TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCGTT TTACAACGTC 6240 

GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTACAT GGAGAAAATA AAGTGAAACA 6300 

AAGCACTATT GCACTGGCAC TCTTACCGTT ACCGTTACTG TTTACCCCTG TGACAAAAGC 636 0 

CGCCCAGGTC CAGCTGCTCG AGTCAGGCCT ATTGTGCCCA GGGGATTGTA CTAGTGGATC 6420 

CTAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT AGTTTACAGG CAAGTGCTAC 6480 

TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA GTTGGTGCTA CCATAGGGAT 6540 

TAAATTATTC AAAAAGTTTA CGAGCAAGGC T7CTTAAGCA ATAGCGAAGA GGCCCGCACC 66 00 

GATCGCCCTT CCCAACAGTT GCGCAGCCTG AATGGCGAAT GGCGCTTTGC CTGGTTTCCG 666 0 

GCACCAGAAG CGGTGCCGGA AAGCTGGCTG GAGTGCGATC TTCCTGAGGC CGATACGGTC 6720 

GTCGTCCCCT CAAACTGGCA GATGCACGGT TACGATGCGC CCATCTACAC CAACGTAACC 6780 

TATCCCATTA CGGTCAATCC GCCGTTTGTT CCCACGGAGA ATCCGACGGG TTGTTACTCG 6 840 

CTCACATTTA ATGTTGATGA AAGCTGGCTA CAGGAAGGCC AGACGCGAAT TATTTTTGAT 6 9 00 

GGCGTTCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG AATTTTAACA 6960 

AAATATTAAC GTTTACAATT TAAATATTTG CTTATACAAT CTTCCTGTTT TTGGGGCTTT 7020 

TCTGATTATC AACCGGGGTA CATATGATTG ACATGCTAGT TTTACGATTA CCGTTCATCG 7 080 

ATTCTCTTGT TTGCTCCAGA CTCTCAGGCA ATGACCTGAT AGCCTTTGTA GATCTCTCAA 714 0 

AAATAGCTAC CCTCTCCGGC ATTAATTTAT CAGCTAGAAC GGTTGAATAT CATATTGATG 72 00 

GTGATTTGAC TGTCTCCGGC CTTTCTCACC CTTTTGAATC TTTACCTACA CATTACTCAG 726 0 

GCATTGCATT TAAAATATAT GAGGGTTCTA AAAATTTTTA TCCTTGCGTT GAAATAAAGG 732 0 
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CTTCTCCCGC AAAAGTATTA CAGGGTCATA ATGTTTTTGG TACAACCGAT TTAGCTTTAT 7380 
GCTCTGAGGC TTTATTGCTT AATTTTGCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG 7440 
ACGTT 7445 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) IiENGTH: 7409 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTGGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


160 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


f f A 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 
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ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 156 0 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 174 0 

CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 186 C 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 19 80 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2 040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAIVCGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 282 0 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA J^ATTAGGCTC TGGAAAGACG 3240 

CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 34 80 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 354 0 



8NSD0CID: <WO 94 1 1 496A1J_> 



wo 94/1 1496 



per/ US93/ 10850 



92 



AAATTAGGAT 


GGGATATTAT 


TTTTCTTGTT 


CAGGACTTAT 


CTATTGTTGA 


TAAACAGGCG 


3600 


CGTTCTGCAT 


TAGCTGAACA 


TGTTGTTTAT 


TGTCGTCGTC 


TGGACAGAAT 


TACTTTACCT 


3660 


TTTGTCGGTA 


CTTTATATTC 


TCTTATTACT 


GGCTCGAAAA 


TGCCTCTGCC 


TAAATTACAT 


3720 




TT a a. a T AT^v; 

X XaaaXaXVtW 


CGATTCTCAA 

W\JAX X Vi» X WAA 


TT Aa r* r*T A 

X XAAV7WWWXa 


TOTTi^ 
w X VJ X X VXAvWV? 


TTrtrt^TTT AT 

X XWwWX ^ XAX 


3780 


ACTGGTAAGA 


IV mmm^ m iv m iv iv 

ATTTGTATAA 


CGCATATGAT 


ACTAAACAGG 


CTTTTTCTAG 


m TV TV mm tv m^ m m 
TAATTATGAx 


J O 4 U 


TCCGGTGTTT 


ATTC TT ATTT 


AACGCCTTAT 


TTATCACACG 


GTCGGTATTT 


CAAACCATTA 


O O A A 


AATTTAGGTC 


AGAAGATGAA 


G C TTAC T AAA 


AT AT ATT TGA 


AAAAGTTTTC 


ACGCGTTCTT 


"3 O £ n 


TGTCTTGCGA 


TTGiAAxTTGC 


AX (-A\3{JAX X X 


AvJAX aX A^9X X 


ivmTvmK )v^^^& 
AT AT AA w C CA 


TV ^ m jv TV *^ y^/* 
Aw W XAAuWWv7 




TV ^^^*T1fnfk IV TV K 

GAGGTTAAAA 


AVJU X AU X L> X ^ 


X WaVjaww X aX 


TV mmmqtr* tv rn TV 
iaii.X X X XuaXA 


AAxx wACXAl 


XVjAw X W X X W X 


H W O V 


CAGCGTCXTA 


* m^T*fc ^^T'fc 
A X C xAAO C X A 


X CuCx ATuxx 


ff ^ ■R K TV mm 


CTAAGsKxAAA 


AX XAAX X AAX 


*i X V 


fv IV IV mfn 

AGCGAC GAT T 


m IV ^ IV ^ TV IV ^ TV 

TACAGAAGCA 


IV ^^fnfitiv iv 

AGGTTATTCA 


^m^ IV ^ TV m TV m iv 

CTCACATATA 


TTGATTTATG 


TACTGTTTCC 




ATTAAAAAAG 


GTAATTCAAA 


TGAAATTGTT 


AAATGTAATT 


AATTTTGTTT 


TCTTGATGTT 


>( *5 n 


TGTTTCATCA 


TCTTCTTTTG 


CTCAGGTAAT 


TGAAATGAAT 


AATTCGCCTC 


TGCGCGATTT 


432 0 


TGTAACTTGG 


TATTCAAAGC 


AATCAGGCGA 


ATCCGTTATT 


GTTTCTCCCG 


ATGTAAAAGG 


4380 


TACTGTTACT 


GTATATTCAT 


CTGACGTTAA 


ACCTGAAAAT 


CTACGCAATT 


TCTTTATTTC 


4440 


TGTTTXACGT 


GCTAATAATT 


TTGATATGGT 


TGGTTCAATT 


CCTTCCATAA 


TTCAGAAGTA 


4500 


TAATCCAAAC 


AAT CAGGATT 


ATATTGATGA 


ATTGC CATCA 


TCTGATAAT C 


AGGAATATGA 


4560 


TGATAATTCC 


GCTCCTTCTG 


GTGGTTTCTT 


TGTTCCGCAA 


AATGATAATG 


TTACTCAAAC 




TT TTAAAATT 


AATAACGTT C 


GGGCAAAGGA 


mmfn % tl m t^ « 

TTTAATACGA 


GTTGTCGAAT 


m,^fvt«nfvi,^m TV tv tv 

TGTTTGTAAA 


>i c Q n 
4 D o U 


GTCTAATACT 


TCTAAATCCT 


CAAATGTATT 


ATC TATTGAC 


^«^*^«m>*m«t m. mfi.m 

GGCTCT AAT C 


fvt TV mm TV ^«mm^*m 

TAT TAG TTGT 


4 / 4U 


TAGTGCACCT 


AAAGATATTT 


TAGATAACCT 


TCCTCAATTC 


CTTTCTACTG 


TTGATTTGCC 


4800 


IV IV t IV ^^^^ IV 

AACTGAC CAG 


IV ^Riv If Iff IV fnfn,^ 

ATATTGATTG 


AGGGTTTGAT 


ATTTGAGGTT 


CAGCAAGGTG 


ATGCTTTAGA 


jf o £ n 


TTTTTCATTT 


GCTGCTGGCT 


CTCAGCGTGG 


CACTGTTGCA 


GGCGGTGTTA 


ATACTGACCG 


4 72U 




GTTTTATCTT 


Cx6CTGGTGs9 


TTCGTTCGGT 


ATTTTTAATG 


GCGATGTTTT 


^ Q Q A 




U X 1 U»vvv»vV*iix 


Xiiiwi^aiAV* Xiwl 


XawWWAX XwA 


AAAATATTGT 


^m/^ m/*» ^ TV 

wTGTGw wA wG 


9U4U 


m jv ffirp^fpm jv 


W X X X Ca^vu X W 


AVsAA^a%ar\9 X X C 


XaXWXWXvsX X 


rt/i^O Iv A ?V fP/* 

xruwwlVaAATw 


TCCCTTTTAT 


3 X U U 


X AW X Vru X \v V9 X 


U X UtA W X VJVV X 


aaX X WW WaA 


Xa9 X aaaX AaX 


/^r* TV mmfft^ TV /*■ tv 
WW1.X X xCAVjA 


wGATTVaAGwG 


3 Xo U 


TCAAAATG'PA 

X WJ^J^XUlX^VX^k 


GGTATTTCCA 


X X X X X 


X XU X X>9Wa 


AX XAvVa 


/;m R H ip jv fnm/*m 
W XAAX AX X^a X 




TC TGGAT ATT 

XWXwWAXAX X 




fPfiATafiTTT 

WwUAXAVX X X 


fiifiTTOTT^T 
WAV9 X X W X X W X 


AW X WAurVyiiwAA 


VaX wAX VaX XAX 


^ •? ft n 

3 ^ O U 


TACTAATCAA 


AGAAGTATTG 


CTACAACGGT 


TAATTTGCGT 


GATGGACAGA 


CTCTTTTACT 


5340 


CGGTGGCCTC 


ACTGATTATA 


AAAACACTTC 


TCAAGATTCT 


GGCGTACCGT 


TCCTGTCTAA 


5400 


AATCCCTTTA 


ATCGGCCTCC 


TGTTTAGCTC 


CCGCTCTGAT 


TCCAACGAGG 


AAAGCACGTT 


5460 


ATACGTGCTC 


GTCAAAGCAA 


CCATAGTACG 


CGCCCTGTAG 


CGGCGCATTA 


AGCGCGGCGG 


5520 


GTGTGGTGGT 


TACGCGCAGC 


GTGACCGCTA 


CACTTGCCAG 


CGCCCTAGCG 


CCCGCTCCTT 


5580 
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TCGCTTTCTT 


CCCTTCCTTT 


CTCGCCACGT 


TCGCCGGCTT 


TCCCCGTCAA 


GCTCTAAATC 


5640 




GGGGGCTCCC 


TTTAGGGTTC 


CGATTTAGTG 


CTTTACGGCA 


CCTCGACCCC 


AAAAAACTTG 


5700 




ATTTGGGTGA 


TGGTTCACGT 


AGTGGGCCAT 


CGCCCTGATA 


GACGGTTTTT 


CGCCCTTTGA 


5760 




CGTTGGAGTC 


CACGTTCTTT 


AATAGTGGAC 


TCTTGTTCCA 


AACTGGAACA 


ACACTCAACC 


5820 




CTATCTCGGG 


CTATTCTTTT 


GATTTATAAG 


GGATTTTGCC 


GATTTCGGAA 


CCACCATCAA 


5880 




ACAGGATTTT 


CGCCTGCTGG 


GGCAAACCAG 


CGTGGACCGC 


TTGCTGCAAC 


TCTCTCAGGG 


5940 




CCAGGCGGTG 


AAGGGCAATC 


AGCTGTTGCC 


CGTCTCGCTG 


GTGAAAAGAA 


AAACCACCCT 


6000 




GGCGCCCAAT 


ACGCAAACCG 


CCTCXCCCCG 


CGCGTTGGCC 


GATTCATTAA 


TGCAGCTGGC 


6060 




ACGACAGGTT 


TCCCGACTGG 


AAAGCGGGCA 


GTGAGCGCAA 


CGCAATTAAT 


GTGAGTTAGC 


6120 




TCACTCATTA 


GGCACCCCAG 


GCTTTACACT 


TTATGCTTCC 


GGCTCGTATG 


TTGTGTGGAA 


6180 




TTGTGAGCGG 


ATAACAATTT 


CACACGCGTC 


ACTTGGCACT 


GGCCGTCGTT 


TTACAACGTC 


6240 




GTGACTGGGA 


AAACCCTGGC 


GTTACCCAAG 


CTTTGTACAT 


GGAGAAAATA 


AAGTGAAACA 


6300 




AAGCACTATT 


GCACTGGCAC 


TCTTACCGTT 


ACTGTTTACC 


CCTGTGGCAA 


AAGCCTATGG 


6360 




GGGGTTTATG 


ACTTCTGAGG 


GATCCGOAGC 


TGAAGGCGAT 


GACCCTGCTA 


AGGCTGCATT 


6420 




CAATAGTTTA 


CAGGCAAGTG 


CTACTGAGTA 


CATTGGCTAC 


GCTTGGGCTA 


TGGTAGTAGT 


6480 




TATAGTTGGT 


GCTACCATAG 


GGATTAAATT 


ATTCAAAAAG 


TTTACGAGCA 


AGGCTTCTTA 


6540 




AGCAATAGCG 


AAGAGGCCCG 


CACCGATCGC 


CCTTCCCAAC 


AGTTGCGCAG 


CCTGAATGGC 


6600 




GAATGGCGCT 


TTGCCTGGTT 


TCCGGCACCA 


GAAGCGGTGC 


CGGAAAGCTG" 


GCTGGAGTGC 


6660 




GATCTTCCTG 


AGGCCGATAC 


GGTCGTCGTC 


CCCTCAAACT 


GGCAGATGCA 


CGGTTACGAT 


6720 




GCGCCCATCT 


ACACCAACGT 


AACCTATCCC 


ATTACGGTCA 


ATCCGCCGTT 


TGTTCCCACG 


6780 




GAGAATCCGA 


CGGGTTGTTA 


CTCGCTCACA 


TTTAATGTTG 


ATGAAAGCTG 


GCTACAGGAA 


6840 




GGCCAGACGC 


GAATTATTTT 


TGATGGCGTT 


CCTATTGGTT 


AAAAAATGAG 


CTGATTTAAC 


6900 




AAAAATTTAA 


CGCGAATTTT 


AACAAAATAT 


TAACGTTTAC 


AATTTAAATA 


TTTGCTTATA 


6960 




CAATCTTCCT 


GTTTTTGGGG 


CTTTTCTGAT 


TATCAACCGG 


GGTACATATG 


ATTGACATGC 


7020 




TAGTTTTACG 


AT7ACCGTTC 


ATCGATTCTC 


TTGTTTGCTC 


CAGACTCTCA 


GGCAATGACC 


7080 




TGATAGCCTT 


TGTAGATCTC 


TCAAAAATAG 


CTACCCTCTC 


CGGCATTAAT 


TTATCAGCTA 


7140 




GAACGGTTGA 


ATATCATATT 


GATGGTGATT 


TGACTGTCTC 


CGGCCTTTCT 


CACCCTTTTG 


7200 




AATCTTTACC 


TACACATTAC 


TCAGGCATTG 


CATTTAAAAT 


ATATGAGGGT 


TCTAAAAATT 


7260 




TTTATCCTTG 


CGTTGAAATA 


AAGGCTTCTC 


CCGCAAAAGT 


ATTACAGGGT 


CATAATGTTT 


7320 




TTGGTACAAC 


CGATTTAGCT 


TTATGCTCTG 


AGGCTTTATT 


GCTTAATTTT 


GCTAATTCTT 


7380 




TGCCTTGCCT 


GTATGATTTA 


TTGGACGTT 








7409 
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(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTG TT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCGCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440. 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 


ATTCACCTCG 


AAAGCAAGCT 


GATAAACCGA 


TACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTCCTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAAGT 


TGTTTAGCAA 


AACCCCATAC 


AGAAAATTCA 


1680 
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TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 

CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 

TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 

GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA "2760 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 

CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 

AAATTAGGAT GGGATATTAT CTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 372 0 
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GTTGGAGTCC ACGTTCTTTA ATAGTGGACT CTTGTTCCAA ACTGGAACAA CACTCAACCC 5 82 0 

TATCTCGGGC TATTCTTTTG ATTTATAAGG GATTTTGCCG ATTTCGGAAC CACCATCAAA , 588 0 

CAGGATTTTC GCCTGCTGGG GCAAACCAGC GTGGACCGCT TGCTGCAACT CTCTCAGGGC 594 0 

CAGGCGGTGA AGGGCAATCA GCTGTTGCCC GTCTCGCTGG TGAAAAGAAA AACCACCCTG' 6 000 

GCGCCCAATA CGCAAACCGC CTCTCCCCGC GCGTTGGCCG ATTCATTAAT GCAGCTGGCA 6 06 0 

CGACAGGTTT CCCGACTGGA AAGCGGGCAG TGAGCGCAAC GCAATTAATG TGAGTTAGCT 6120 

CACTCATTAG GCACCCCAGG CTTTACACTT TATGCTTCCG GCTCGTATGT TGTGTGGAAT 6180 

TGTGAGCGGA TAACAATTTC ACACAGGAAA CAGCTATGAC CAGGATGTAC GAATTCGCAG 624 0 

GTAGGAGAGC TCGGCGGATC CGAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT 6300 

AGTTTACAGG CAAGTGCTAC TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA 6360 

GTTGGTGCTA CCATAGGGAT TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAACCA 6420 

GCTGGCGTAA TAGCGAAGAG GCCCGCACCG ATCGCCCTTC CCAACAGTTG CGCAGCCTGA 64 80 

ATGGCGAATG GCGCTTTGCC TGGTTTCCGG CACCAGAAGC GGTGCCGGAA AGCTGGCTGG 6540 

AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATCCACGGTT 6600 

ACGATGCGCC CATCTACACC AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC 6660 

CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA AGCTGGCTAC 6720 

AGGAAGGCCA GACGCGAATT ATTTTTGATG GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 6780 

TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATTAACG TTTACAATTT" AAATATTTGC 6840 

TTATACAATC TTCCTGTTTT TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 69 00 

CATGCTAGTT TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCTCCAGAC TCTCAGGCAA 696 0 

TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACC CTCTCCGGCA TTAATTTATC 7020 

AGCTAGAACG GTTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC 7 080 

TTTTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 7140 

AAATTTTTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 72 00 

TGTTTTTGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 7260 

TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT 7294 
(2) IKFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) XiENGTH: 7394 base paird 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 6 0 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 12 0 
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CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 36 0 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 4 8Q 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 126 0 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 138 0 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 15 00 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 

TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 

CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 1800 

TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 

TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 

ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 19 80 

AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2 04 0 

CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 210 0 

CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 216 0 
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TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 222 0 

GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 22 80 

GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 

GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 

GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 2460 

GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGT 2520 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 27 00 

TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 276 0 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATCCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG - -3240 

CTCGTTAGCG TTGGTAAGAT TTAGGATAAA ATTGXAGCTG GGTGCAAAAT AGCAACTAAT 3300 

CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGACGT TCGCTAAAAC GCCTCGCGTT 336 0 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 342 0 

TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 354 0 

AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CtTATTGTTGA TAAACAGGCG 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATl'CTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 37 80 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 39 00 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 396 0 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 402 0 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 408 0 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 414 0 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 42 0 0 
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ATTTGGGTGA 


TGGTTCACGT 


AGTGGGCCAT 


CGCCCTGATA 


GACGGTTTTT 


CGCCCTTTGA 


5760 


CGTTGGAGTC 


CACGTTCTTT 


AATAGTGGAC 


TCTTGTTCCA 


AACTGGAACA 


ACACTCAACC 


5820 


CTATCTCGGG 


CTATTCTTTT 


GATTTATAAG 


GGATTTTGCC 


GATTTCGGAA 


CCACCATCAA 


5880 


ACAGGATTTT 


CGCCTGCTGG 


GGCAAACCAG 


CGTGGACCGC 


TTGCTGCAAC 


TCTCTCAGGG 


5940 


CCAGGCGGTG 


AAGGGCAATC 


AGCTGTTGCC 


CGTCTCGCTG 


GTCxAAAAviAA 


AAACCAC C C T 




GGCGCCCAAT 


ACGCAAACCG 


CCTCTCCCCG 


CGCGTTGGCC 


GATTCATTAA 


TGCAGCTGGC 


6060 


ACGACAGGTT 


TCCCGACTGG 


AAAGCGGGCA 


GTGAGCGCAA 


CGCAATTAAT 


GTGAGTTAGC 


6120 


TCACTCATTA 


GGCACCCCAG 


GCTTTACACT 


TTATGCTTCC 


GGCTCGTATG 


TTGTGTGGAA 


6180 


TTGTGAGCGG 


ATAACAATTT 


CACACGCGTC 


ACTTGGCACT 


GGCCGTCGTT 


TTACAACGTC 


6240 
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AAA Ct^^ TCltZC* 


rtTT ^ CCCAAG 

\S X Xa'^wW W%f%w 


CTTTGTACAT 

W X X X w XXkWX^X 


GGAGAAAATA 


AAGTGAAACA 


6300 


a A n f A. P T JVTT 




X ^X X Awwva A A 


ACTGTTTACC 


C C7GTGGCAA 


AAGCCCTTCT 


6360 


UtAWWW^^X 


GGAGCTGAAG 


GCGATGACCC 


TGCTAAGGCT 


G C ATTCAAT A 


GTTTAC AGG C 


6420 


4W%.U X XXlw X 


GAG T A C AT TG 


GCTACGCTTG 

WW X 4%W V9 W ^ X V 


GGCTATGGTA 


GTAGTTATAG 


TTGGTGCTAC 


6480 


A T A A TT 


a aattattca 
aaax xax x^mA 


AAAAGTTTAC 

AAAAUX X XAW 


GAG CAAGGC T 


TCTTAAGCAA 


TAGCGAAGAG 


6540 




ATCGCCCTTC 

^VX wV9V« WW X X W 


CCAACAGTTG 
Ww^^f&w^^v X ^ w 


CGCAGCCTGA 

%»%vwtfmi%7W^v A va«i 


ATGGCGAATG 


GCGCTTTGCC 


6600 


X VvVa XXX w^u^j 


CACCAGAAGC 


GGTGCCGGAA 


AGCTGGCTGG 


AGTGCGAT C T 


TCCTGAGGCC 


6660 




X X WWWW X w 


AAACTGGCAG 


ATGCACGGTT 


ACGATGCGCC 


CATCTACAC C 


672 0 


A A i^rST A A ^r*T 


ATCCCATTAC 
aXwWWAX XaW 


GG TCAAT CCG 
X wju^ X www 


WwwX X XwX xw 


CCACGGAGAA 


TCCGACGGGT 


6780 


X w X X AW X WwV* 


*PO AC A'l"F*'l' A A 

X ^aaAWAX X X aa 


TGTTGATGAA 

XwX XVdUVXSaAXl 


AGCTGGCTAC 


AGGAAGGCCA 


GACGCGAATT 


6840 


AX X X X X\»AX\s 


/tlC/tTTC CP A T 
V9WUX XwIvXaX 


TGnTTAAAAA 

XWVVX XAAAAA 


ATG AGC TGAT 

AXUlAwW X WlAX 


TTAACAAAAA 

X A #WfcW AAAAA 


TTTAACGCGA 


6900 


A XX. X X aACaa 


AAXAX X AAWU 


X X X AUnAAX X X 


A A AT aTTTGC 

AAAXaX X XV9*ta> 


TTATACA ATC 

X X AX AWAAX 


mrpf* ft m^ m mmm 

X X WW X w ^ ^ ^ 


6960 


TGGGGCTTTT 


C TGATTAT WA 


AW^ur%3Va\3 X aw 


A T A TVS A 'P'P/2 A 
aXaX\xaX XV3A 


WAXVsV^XAVaX X 


•TTACGATTAC 

X XAU>\AAX X AW 


7020 


CGTTCATCGA 


TTCTCTTGTT 


TGCTCwAuAw 


X W X UlVSuCJlA 




WWW X X X V9 X AW 


7080 

/ u o u 


ATCTCTCAAA 


AATAGCT AC C 


CTCTCCGGCA 


TXAATTT AT C 


AGC TAGAACG 


^mm/^ A A n» A TO 
wx XwAaXax w. 


/ Xhb V 


ATATTGATGG 


TGATTTGACT 


GTCTCCGGCC 


TTTCTCACCC 


TTTTGAATCT 


TTACCTACAC 


7200 


ATTACTCAGG 


CATTGCATTT 


AAAATATATG 


AGGGTTCTAA 


AAATTTTTAT 


CCTTGCGTTG 


7260 


AAATAAAGGC 


TTCTCCCGCA 


AAAGTATTAC 


AGkSGTCATAA 


TGTTTTTGGT 


ACAAGCGATT _ 


-7.3 2 0_ 


TAGCTTTATG 


CTCTGAGGCT 


TTATTGCTTA 


ATTTTGCTAA 


TTCTTTGCCT 


TGCCTGTATG 


7380 


ATTTATTGGA 


CGTT 










7394 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) laENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GATCCTAGGC TGAAGGCGAT GACCCTGCTA AGGCTGC 37 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) IiENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
ATTCAATAGT TTACAGGCAA GTGCTACTGA GTACA 35 
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(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TTGGCTACGC TTGGGCTATG GTAGTAGTTA TAGTT 35 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

( D ) TOPOLOGY : linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:X0: 
GGTGCTACCA TAGGGATTAA ATTATTCAAA AAGTT 35 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TACGAGCAAG GCTTCTTA 18 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGCTTAAGAA GCCTTGCTCG TAAACTTTTT GAATAATTT 39 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 36 base pairs 
( B } TYPE : nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
AATCCCTATG GTAGCACCAA CTATAACTAC TACCAT 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGCCCAAGCG TAGCCAATGT ACTCAGTAGC ACTTG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

- _(C). STRANDEDNESS: singj.e 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCTGTAAACT ATTGAATGCA GCCTTAGCAG GGTC 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
ATCGCCTTCA GCCTAG 
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(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 has pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CTCGAATTCG TACATCCTGG TCATAGC 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

CATTTTTGCA GATGGCTTAG A 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
<B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TAGCATTAAC GTCCAATA 
(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATATATTTTA GTAAGCTTCA TCTTCT 
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(2) INFORMATION FOR SEQ XD NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 
GACAAAGAAC GCGTGAAAAC TTT 23 
(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GCGGGCCTCT TCGCTATTGC TTAAGAAGCC TTGCT 35 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

- - - (C) STRANDEDNE_SS : single 

(D) TOPOLOGY: linear ~ 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TTCAGCCTAG GATCCGCCGA GCTCTCCTAC CTGCGAATTC GTACATCC 48 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TGGATTATAC TTCTAAATAA TGGA 24 
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(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 36 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
AATTCGCCAA GGAGACAGTC AT 22 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATTGTT 39 
(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
ATTACTCGCT GCCCAACCAG CCATGGCCGA GCTCGTGAT 39 
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(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TCTAGAACGC 6TC 

(2) INFORMATION FOR SEQ ID HO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
- (D) TOPOLOGY: linear ^ 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

ACGTGACGCG TTCTAGAATT AACACTCATT CCTGT 

(2) INFORMATION FOR SEQ ID N0:32: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TGGATATCTG GAGTCTGGGT CATCACGAGC TCGGCCATG 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQtJENCE DESCRIPTION: SEQ ID NO:33: 
GCTGGTTGGG CAGCGAGTAA TAACAATCCA GCGGCTGCC 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

GTAGGCAATA GGTATTTCAT TATGACTGTC CTTGGCG 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
TGACTGTCTC CTTGGCGTGT GAAATTGTTA 
(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 
<A) X^KGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOIjOGY : linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CAATTTTATC CTAAATCTTA CCAAC 25 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) I4ENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CATTTTTGCA GATGGCTTAG A 21,- 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

" (D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
CGAAAGGGGG GTGTGCTGCA A 21 
(2) INFORMATION FOR SEQ ID NO: 40: 

<i) SEQUENCE CHARACTERISTICS: 

(A) IiENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
TAGCATTAAC GTCCAATA 18 
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<2) INFORMATIOK FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

AAACGACGGC CAGTGCCAAG TGACGCGTGT GAAATTGTTA TCC 

(2) INFORMATION FOR SEQ ID NO: 42: 

<i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 43 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

GGCGAAAGGG AATTCTGCAA GGCGATTAAG CTTGGGTAAC GCC 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH; 36 base pairs 
(B> TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GGCGTTACCC AAGCTTTGTA CATGGAGAAA ATAAAG 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
TGAAACAAAG CACTATTGCA CTGGCACTCT TACCGTTACC GT 



BNSDOCID: <WO 9411496A1_L> 



wo 94/11496 



PCr/US93/ 10850 



111 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 baee pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOXiOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TACTGTTTAC CCCTGTGACA AAAGCCGCCC AGGTCCAGCT GC 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
TCGAGTCAGG CCTATTGTGC CCAGGGATTG TACTAGTGGA TCCG 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

_ i?> STRANDEDNESS: single 

(D) TOPOLOGY: linear - - . 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOt47: 
TGGCGAAAGG GAATTCGGAT CCACTAGTAC AATCCCTG 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
GGCACAATAG GCCTGACTCG AGCAGCTGGA CCAGGGCGGC TT 
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(2) INFORMATION FOR SEQ ID NO: 49: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRAKDEDNESS : single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TTGTCACAGG GGTAAACAGT AACGGTAACG GTAAGTGTGC CA 42 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRAKDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 



GTGCAATAGT GCTTTGTTTC ACTTTATTTT CTCCATGTAC AA 42 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TAACGGTAAG AGTGCCAGTG C 21 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 68 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: misc_di££erence 

(B) LOCATION: replace (25, --) 

(D) OTHER INFORMATION: /note= -"M represents an equal 
mixture of A cuid C at this location and at 
locations 28, 31, 34, 37, 40, 43, 46 & 49-- 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
AGCTCCCGGA TGCCTCAGAA GATGMNNMNN MNNMNNMNNM NNMNNMNNMN NGGCTTTTGC 6 0 

CACAGGGG 6 8 
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(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



( ix ) FEATURE : 

(A) NAME/KEY: misc^dif fere nee 

(B) LOCATION: replace (17, ••") 

(D) OTHER INFORMATION: /note= "•'M represents an equal 
mixture of A and c at this location and at 
locations 20, 23, 26, 29, 22, 25, 2B, 41, 44 & 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

CAGCCTCGGA TCCGCCMNNM NNMNNMNNMN NMNNMNNMNN MNNMNNATGM GAAT 54 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: line£Lr 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

GGTAAACAGTJftACGGTAAGA GTS^ 27 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GGGCTTTTGC CACAGGGGT 19 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
AGGGTCATCG CCTTCAGCTC CGGATCCCTC AGAAGTCATA AACCCCCCAT AGGCTTTTGC 
CAC 
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(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRAKDEDKESS : single 

(D) TOPOLOGY: linear 



(Xi) SEOUENCE DESCRIPTION: SEQ ID NO: 57: 
TCGCCTTCAG CTCCCGGATG CCTCAGAAGC ATGAACCCCC CATAGGC 47 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
CAATTTTATC CTAAATCTTA CCAAC 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GCCTTCAGCC TCGGATCCGC C 21 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

( c ) STRANDEDNESS : single 
( D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
CGGATGCCTC AGAAGCCCCN N 21 
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(2) INFORMATION FOR SZQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) IiENGTH: 30 has pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
CGGATGCCTC AGAAGGGCTT TTGCCACAGG 30 
(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Thr Gin ser Lys Cys Ser Thr Asp His Trp Leu Gly Tyr lie Glu Tyr 
15 10 15 

Phe lie Met Cys Thr Tyr 
20 

(2) INFORMATION FOR SEQ ID NO:63: _ _ „ 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: circular 

( ii ) MOLECULE TYPE : peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

Cys Asp Asp Gin Tyr Tyr Thr Asp His Glu Gin Gly Lys Cys Glu Val 
15 10 15 

Ala Leu Tyr Tyr Thr Gly 
20 
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(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: both 

(ii) MOI*ECULE TYPE: peptide 



(Xi) SEQXJENCE DESCRIPTION: SEQ ID NO:64: 

Thr Gin Ser Lys Cys Ser Thr Asp His Trp Leu Gly Tyr lie Glu Tyr 
15 10 15 

Phe lie Met Cys Thr Tyr Arg Arg 
20 



(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

Thr Cys Leu Arg Glu Glu Phe lie Leu Gin cys Tyr lie Val Met II 
15 10 15 

Glu Asp Trp Tyr 
20 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

lie cys Glu His His Gin Met Leu Leu Gin Cys ser Leu Val Cys Glu 
15 10 15 

Glu cys Met Met 
20 
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(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) liENGTH: 20 amino acids 

(B) TYPE: ami no acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

Lys Cys lie lie Gly Trp Tyr Thr Leu Thr Cys Tyr Met Ser Asp Arg 
15 10 15 

Pro Arg Met Glu 

20 

(2) INFORMATION FOR SEQ ID NO:68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Ala cys Thr Gin Asp Met Asn Trp lie Thr Cys Pro Met Tyr Cys Glu 
15 10 15 

Val Leu Cys Phe _ 

2t) 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

Val Cys Phe Tyr Phe Pro Phe Lys Met Met Cys His Met Glu Tyr He 
15 10 15 

Ala Tyr Glu Tyr 
20 
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(2) INFORMATION FOR SEQ ID NO: 70: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

Asp Ala Asn cys Gly His Cys Thr Tyr Met Cys lie Cys Lys lie Met 
1 5 10 15 

Tyr Tyr lie Ser 
20 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: 

Trp His Arg His Val Ser Ser Pro Met Ser Cys Trp Trp Tyr Asp Gin 
15 10 15 

cys Ala Val Ala 
20 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

Cys Val Gin lie Asp Phe Phe Thr Val Gin Cys Asn He Ser Ser His 
1 5 10 15 

Met Phe Leu Pro 
20 
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I CLAIM: 

1. A composition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides, each of said 
oligonucleotides encoding a soluble peptide having 

5 constrained secondary structure in solution, wherein each 
of said oligonucleotides is operationally linked to 
expression elements, said expressible oligonucleotides 
having a desirable bias of random codon sequences. 

2. The composition of claim 1, wherein said 
oligonucleotides have more than one codon encoding an 
cuaino acid capable of forming a covalent bond. 

3 . The composition of claim 2 , wherein said 
amino acid is an amino acid selected from the group 
consisting of cysteine, glutamic acid, lysine, leucine or 
tyrosine, 

4 . The composition of claim 2 , wherein said 
oligonucleotide is selected from the group consisting of 
TCLREEFILQCYIVMIEDWY , ICEHHQMLLQCSLVCEECMM, 

KCI IGWYTLTCYMSDRPRME , ACTQDMNWITCPMYCEVLCF , 
VCFYFPFKMMCHMEYIAYEY , DANCGHCTYMCICKIMYYIS , 
WHRHVSSPMSCWWYDQCAVA and CVQIDFFTVQCNISSHMFLP 
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5. The composition of claim 1, wherein said 
cells ar procaryotes . 

6. The composition of claim 4, wherein said 
procaryotic cells are E, coli * 

7. The composition of claim 1, wherein said 
expressible oligonucleotides are expressed as peptide 
fusion proteins on the surface of a filamentous 
bacteriophage . 

8» A composition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides, each of said 
oligonucleotides encoding a soluble peptide having 
5 constrained secondary structure in solution , wherein each 
of said oligonucleotides is operationally linked to 
expression elements, said expressible oligonucleotides 
having a desirable bias of random codon sequences 
produced from random combinations of first and second 
10 oligonucleotide precursor populations, each or either of 
said first and second precursor having a desirable bias 
of random codon sequences. 

9. The composition of claim 8, wherein said 
first or second precursor oligonucleotides are biased. 
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10 ♦ The composition of claim 8^ wherein said 
first and second precursor oligonucleotid s are biased. 

11. The composition of claim 8, wherein said 
first or second precursor oligonucleotides have more than 
one codon encoding an amino acid capable of forming a 
covalent bond. 

12. The composition of claim 8, wherein said 
first and second precursor oligonucleotides have at least 
one codon encoding an amino acid capable of forming a 
covalent bond. 



1 3 • The composition of claim 8 , wherein said 
oligonucleotide is selected from the group consisting of 
TCLREEFILQCYIVMIEDWY, ICEHHQMLLQCSLVCEECMM , 
KCI IGWYTLTCYMSDRPRME , ACTQDMNWITCPMYCE VLCF , 
VCFYFPFKMMCHME YIAYEY , DANCGHCTYMCICKIMYYI S , 
WHRHVSSPMSCWWYDQCAVA and CVQIDFFTVQCNISSHMFLP 
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14. The composition of claim 11 or 12, wherein 
said amino acid is an amino acid selected from the group 
consisting of cysteine, glutamic acid, lysine, leucine or 
tyrosine. 

15. The composition of claim 8, wherein said 
cells are procaryotes. 

16. The composition of claim 15, wherein said 
procaryotic cells are E. coli , 

17. The composition of claim 8, wherein said 
expressible oligonucleotides are expressed as peptide 
fusion proteins on the surface of a filamentous 
bacteriophage • 
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18. A kit for the preparation of vectors 
useful for the expression of a diverse population of 
random soluble peptides having constrained secondary 
structure in solution, said peptides being generated from 

5 combined first and second precursor oligonucleotides when 
combined having a desirable bias of random codon 
sequences, comprising: two vectors: a first vector having 
a cloning site for said first precursor oligonucleotides 
and a pair of restriction sites for operationally 

10 combining first precursor oligonucleotides with second 
precursor oligonucleotides; and a second vector having a 
cloning site for said second precursor oligonucleotides 
and a pair of restriction sites complementary to those on 
said first vector, one or both vectors containing 

15 expression elements _capable of being operat^io linked" 
to said combined first and second precursor 
oligonucleotides . 

19. The kit of claim 18, wherein said vectors 
are in a filamentous bacteriophage. 

20. The kit of claim 18, wherein said 
filamentous bacteriophage are M13. 

21- The kit of claim 18, wherein said vectors 
are plasmids or phagemids. 
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22. The kit of claim 18, wherein said first or 
second precursor oligonucleotides are biased toward a 
pre-determined sequence, 

23. The kit of claim 18, wherein said first 
and second precursor oligonucleotides are biased toward a 
predetermined sequence. 

24. The kit of claim 18, wherein said first or 
second precursor oligonucleotides have more than one 
codon encoding an amino acid capable of forming a 
covalent bond. 

25. The kit of claim 18, wherein said first 
and second precursor oligonucleotides have at least one 
codon encoding an amino acid capable of forming a 
covalent bond. 

26. The kit of claim 24 or 25, wherein said 
cLmino acid is an amino acid selected from the group 
consisting of cysteine, glutami c acid, lysine, leucine or 
tyrosine. 
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27 • A cloning system for expressing 
oligonucleotides encoding random, soluble peptides having 
constrained secondary structure in solution, said 
oligonucleotides being generated from a desirable bias of 
5 random codon sequences, comprising a vector having a pair 
of restriction sites so as to allow the operational 
combination of said oligonucleotides into a contiguous 
oligonucleotide encoding said soluble peptide having 
constrained secondary structure in solution- 

28. The cloning system of claim 27, wherein 
said oligonucleotides have more than one codon encoding 
an amino acid capable of forming a covalent bond. 
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29. A cloning system for expressing 
oligonucleotides encoding random^ soluble peptides having 
constrained secondary structure in solution, said 
oligonucleotides being generated from diverse populations 

5 of combined first and second precursor oligonucleotides 
each or either having a desirable bias of random codon 
sequences, comprising: a set of first vectors having a 
desirable bias of random codon sequences and a second set 
of vectors having a diverse population of second 

10 precursor oligonucleotides having a desirable bias of 
random codon sequences, said first and second vectors 
each having a pair of restriction sites so as to allow 
the operational combination of said oligonucleotides into 
a contiguous oligonucleotide encoding said soluble 

15 peptide having constrained secondary structure in 
solution. 

30. The composition of claim 29, wherein said 
first or second precursor oligonucleotides are biased. 

31. The composition of claim 29, wherein said 
first and second precursor oligonucleotides are biased. 

32. The cloning system of claim 29, wherein 
said first or second precursor oligonucleotides have more 
than one codon encoding an amino acid capable of forming 
a covalent bond. 
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33. The cloning system of claim 29, wherein 
said first and second precursor oligonucleotides have at 
least one codon encoding an amino acid capable of forming 
a covalent bond. 

34. The cloning system of claim 32 or 33, 
wherein said amino acid is an amino acid selected from 
the group consisting of cysteine, glutamic acid, lysine, 
leucine or tyrosine. 

35. The cloning system of claim 29, wherein 
said combined first and second vectors is through a pair 
of restriction sites. 

36 . The cloning system of claim 29 , wherein 

said expressible oligonucleotides are expressed as 
peptide fusion proteins on the surface of a filamentous 
bacteriophage . 

37. A vector comprising an oligonucleotide, 
said oligonucleotide having a desirable bias of random 
codon sequences, and more than one codon encoding an 
amino acid capable of forming a covalent bond. 

38. A vector of claim 37, wherein said amino 
acid is an amino acid selected from the group consisting 
of cysteine, glutamic acid, lysine, leucine or tyrosine. 
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39. An isolated, soluble peptide having a 
constrained secondary structure in solution. 

40. An expressible oligonucleotide produced by 
^e cloning system of claim 29. 

41. A host cell containing the cloning system 
of claim 29. 

42. A host cell containing the vector of claim 

38. 

43. A method of isolating a soluble peptide 
having a constrained secondary structure in solution, 
which comprises growing said host cell of claim 41 or 42 
under suitable conditions favoring expression of said 

5 peptide, and isolating said peptide so produced. 

44. A method of constructing a diverse 
population of vectors containing combined first and 
second precursor oligonucleotides, wherein each or either 
precursor oligonucleotides has a desirable bias of random 

5 codon sequences, and capable of expressing said combined 
oligonucleotides as random, soluble peptides having 
constrained secondary structure in solution, comprising 
the steps of: 
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(a) operationally linking sequences from a 
diverse population of first precursor 
oligonucleotides having a desirable bias 
of random codon sequences to a first 
5 vector; 



(b) operationally linking sequences from a 
diverse population of second precursor 
oligonucleotides having a desirable bias 
of random codon sequences to a second 

10 vector; 

(c) wherein said first or second, or first and 
second precursor oligonucleotides have at 
least one codon capable of forming a 

c oval exit bond^. 

15 (d) combining the vector products of steps (a) 

and (b) under conditions where said 
populations of first and second precursor 
oligonucleotides are joined together into 
a population of combined vectors capable 

20 of being expressed. 

45. The method of claim 44, wherein said amino 
acid is an amino acid selected from the group consisting 
of cysteine, glutamic acid, lysine, leucine or tyrosine. 
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46. The method of claim 44, wherein steps (a) 
through (d) are repeated two or more times. 
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I 10 
AATGCTACTA 
ATAGCTAAAC 
CGTTCGCAGA 
GTTGCATATT 
TCTGCAAAAA 
TTGGAGTTTG 
TCTTTCGGGC 
CAGGGTAAA6 
TTTGAGGG6G 
AAACATTTTA 
GGTTTTTATC 
AATTCCTTTT 
ATGAATCTTT 
TCTTCCCAAC 
CAAT6ATTAA 
CTCGTCAGGG 
AATATCCGGT 
TGTACACCGT 
GTCTGC6CCT 
CAGGCGATGA 
CAAAGATGAG 
GTGGCATTAC 
CAAAGCCTCT 
CGATCCCGCA 
T6CGTGGGCG 
ATTCACCTCG 
TTTTTGGAGA 
TATTCTCACT 
TTTACTAACG 
CT6TGGAATG 
TGGGTTCCTA 
TCTGAGG6TG 
ATTCCGGGCT 
AACCCCGCTA 
CAGAATAATA 
CAAGGCACTG 
TATGACGCTT 
GATCCATTCG 
GCTGGCGGCG 
GGC6GTTCTG 
GATTTTGATT 
GAAAAC6CGC 
GCTGCTATCG 
6GTGATTTTG 
TTAATGAATA 
TTT6TCTTTA 
TTCCGTGGTG 
TTTGCTAACA 
TATTATTGC6 
TTAAAAAGGG 
GGCTTAACTC 
TTGTTCAGGG 
TCTCTGTAAA 
ATTGGGATAA 
CTCGTTAGCG 
CCTGATTTAA 
CTTA6AATAC 
TCCTACGATG 
ACCCGTTCTT 
AAATTAGGAT 
CGTTCT6CAT 
TTTGTCGGTA 
GTTGGCGTTG 
ACTGGTAAGA 



I 20 
CTATTAGTAG 
AGGTTATTGA 
ATTGGGAATC 
TAAAACAT6T 
TGACCTCTTA 
CTTCC6GTCT 
TTCCTCTTAA 
ACCTGATTTT 
ATTCAATGAA 
CTATTACCCC 
GTCGTCTG6T 
GGCGTTAT6T 
CTACCTGTAA 
GTCCTGACTG 
AGTTGAAATT 
CAAGCCTTAT 
TCTTGTCAAG 
TCATCTGTCC 
CGTTCCGGCT 
TACAAATCTC 
TGTTTTAGTG 
GTATTTTACC 
GTAGCCGTTG 
AAAGCGGCCT 
ATG6TTGTTG 
AAAGCAAGCT 
HTTCAACGT 
CCGCTGAAAC 
TCTGGAAAGA 
CTACAG6C6T 
TTGGGCTTGC 
GCGGTTCTGA 
ATACTTATAT 

A-TGCTA-A-TeC" 

GGTTCCGAAA 

ACCCCGTTAA 

ACTGGAACGG 

TTT6TGAATA 

GCTCTGGTG6 

AGGGTGGCGG 

ATGAAAAGAT 

TACAGTCT6A 

ATGGTTTCAT 

CTGGCTCTAA 

ATTTCC6TCA 

6CGCTGGTAA 

TCTTTGCGTT 

TACTGCGTAA 

TTTCCTCGGT 

CTTCGGTAAG 

AATTCTTGTG 

TGTTCAGTTA 

GGCT6CTATT 

ATAATATGGC 

TTGGTAAGAT 

GGCTTCAAAA 

CG6ATAAGCC 

AAAATAAAAA 

GGAATGATAA 

GGGATATTAT 

TAGCT6AACA 

CTTTATATTC 

TTAAATATGG 

ATTTGTATAA 



i 30 
AATTGATGCC 
CCATTTGC6A 
AACTGTTACA 
TGAGCTACAG 
TCAAAAGGAG 
GGTTCGCTTT 
TCTTTTTGAT 
TGATTTATGG 
TATTTATGAC 
CTCTGGCAAA 
AAACGAGGGT 
ATCTGCATTA 
TAATGTTGTT 
GTATAATGAG 
AAACCATCTC 
TCACTGAATG 
ATTACTCTTG 
TCTTTCAAAG 
AA6TAACATG 
CGTTGTACTT 
TATTCTTTCG 
C6TTTAATGG 
CTACCCTCGT 
TTAACTCCCT 
TCATTGTCGG 
GATAAACCGA 
GAAAAAATTA 
TGTTGAAAGT 
CGACAAAACT 
TGTAGTTTGT 
TATCCCTGAA 
GG6TGGCGGT 
CAACCCTCTC 
TTCTCTTGA6 
TAG6CAGGGG 
AACTTATTAC 
TAAATTCAGA 
TCAAGGCCAA 
TGGTTCTGGT 
CTCTGAGGGA 
GGCAAACGCT 
CGCTAAAGGC 
TGGTGAC6TT 
TTCCCAAATG 
ATATTTACCT 
ACCATAT6AA 
TCTTTTATAT 
TAAGGA6TCT 
TTCCTTCTGG 
ATAGCTATT6 
GGTTATCTCT 
ATTCTCCCGT 
TTCATTTTTG 
TGTTTATTTT 
TCAG6ATAAA 
CCTCCCGCAA 
TTCTATATCT 
CGGCTTGCTT 
GGAAAGACAG 
CTTCCTTGTT 
TCTTGTTTAT 
TCTTATTACT 
CGATTCTCAA 
CGCATATGAT 



i ^Q 
ACCTT7TCAG 
AATGTATCTA 
TGGAATGAAA 
CACCAGATTC 
CAATTAAAGG 
GAAGCTC6AA 
GCAATCCGCT 
TCATTCTCGT 
6ATTCCGCAG 
ACTTCTTTTG 
TATGATAGTG 
GTTGAATGTG 
CCGTTAGTTC 
CCAGTTCTTA 
AAGCCCAATT 
AGCAGCTTT6 
ATGAAGGTCA 
TTGGTCAGTT 
GAGCAGGTC6 
TGTTTCGC6C 
CCTCTTTCGT 
AAACTTCCTC 
TCCGATGCTG 
GCAAGCCTCA 
CGCAACTATC 
TACAATTAAA 
TTATTCGCAA 
TGTTTA6CAA 
TTA6ATCGTT 
ACT6GTGACG 
AAT6A6GGTG 
ACTAAACCTC 
GACGGCACTT- 
GAGTCTCAGC 
GCATTAACTG 
CAGTACACTC 
GACTGCGCTT 
TCGTCTGACC 
GGCGGCTCTG 
GGCGGTTCCG 
AATAAGGGGG 
AAACTTGATT 
TCCGGCCTTG 
GGTCAAGTCG 
TCCCTCCCTC 
TTTTCTATT6 
GTTGCCACCT 
TAATCAT6CC 
TAACTTTGTT 
CTATTTCATT 
CTGATATTA6 
CTAATGCGCT 
AC6TTAAACA 
GTAACTG6CA 
ATTGTAGCTG 
GTCG6GAGGT 
GATTTGCTTG 
GTTCTCGATG 
CCGATTATTG 
CAGGACTTAT 
TGTCGTCGTC 
GGCTCGAAAA 
TTAAGCCCTA 
ACTAAACAG6 



i 50 
CTCGCGCCCC 
ATGGTCAAAC 
CTTCCAGACA 
AGCAATTAAG 
TACTCTCTAA 
TTAAAACGCG 
TTGCTTCTGA 
TTTCTGAACT 
TATTG6ACGC 
CAAAAGCCTC 
TTGCTCTTAC 
GTATTCCTAA 
GTTTTATTAA 
AAATCGCATA 
TACTACTCGT 
TTACGTTGAT 
6CCAGCCTAT 
CGGTTCCCTT 
CGGATTTCGA 
TT6GTATAAT 
TTTAGGTTGG 
ATGAAAAAGT 
TCTTTCGCTG 
6CGACCGAAT 
GGTATCAAGC 
GGCTCCTTTT 
TTCCTTTAGT 
AACCCCATAC 
AC6CTAACTA 
AAACTCA6TG 
GTGGCTCTGA 
CTGAGTACG6 
ATCGGGGT-6G 
CTCTTAATAC 
TTTATACG6G 
CTGTATCATC 
TCCATTCTGG 
TCGGTCAACC 
AGGGTGGTGG 
GTGGT6GCTC 
CTATGACC6A 
CTGTCGCTAC 
CTAATG6TAA 
GTGACGGTGA 
AATCGGTTGA 
ATT6TGACAA 
TTATGTATGT 
AGTTCTTTTG 
GCCGTATCTG 
GTTTCTTGCT 
CGCTCAATTA 
TCCCT6TTTT 
AAAAATCGTT 
AATTAGGCTC 
GGTGCAAAAT 
TCGCTAAAAC 
CTATTGGGCG 
AGT6CGGTAC 
ATTGGTTTCT 
CTATTGTTGA 
TGGACAGAAT 
TGCCTCTGCC 
CTGTTGAGCG 
CTTTTTCTAG 



! 60 
AAATGAAAAT 
TAAATCTACT 
CCGTACTTTA 
CTCTAAGCCA 
TCCTGACCTG 
ATATTTGAA6 
CTATAATAGT 
GTTTAAAGCA 
TATCCAGTCT 
TCGCTATTTT 
TATGCCTCGT 
ATCTCAACT6 
C6TAGATTTT 
AGGTAATTCA 
TCTGGTGTTT 
TTGGGTAATG 
GCGCCTGGTC 
ATGATT6ACC 
CACAATTTAT 
CGCTGGGGGT 
TGCCTTCGTA 
CTTTAGTCCT 
CTGAGGGTGA 
ATATCGGTTA 
TGTTTAAGAA 
GGAGCCTTTT 
TGTTCCTTTC 
AGAAAATTCA 
TGAGGGTTGT 
TTACGGTACA 
GG6TGGCGGT 
TGATACACCT 
TACTGAGCAA 
TTTCATGTTT 
CACT6TTACT 
AAAAGCCATG 
CTTTAATGAA 
TCCTGTCAAT 
CTCTGAGGGT 
TGGTTCCGGT 
AAATGCCGAT 
TGATTACGGT 
TGGTGCTACT 
TAATTCACCT 
ATGTCGCCCT 
AATAAACTTA 
ATTTTCTACG 
GGTATTCCGT 
CTTACTTTTC 
CTTATTATTG 
CCCTCTGACT 
TATGTTATTC 
TCTTATTTGG 
TGGAAAGACG 
AGCAACTAAT 
GCCTCGCGTT 
CGGTAATGAT 
TTGGTTTAAT 
ACATGCTCGT 
TAAACAGGCG 
TACTTTACCT 
TAAATTACAT 
TTG6CTTTAT 
TAATTATGAT 
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TCCGGTGTTT 
AATTTAGGTC 
TGTCTTGCGA 
GAGGTTAAAA 
CAGCGTCTTA 
AGCGACGATT 
ATTAAAAAGG 
GTTTCATCAT 
6TAACTTGGT 
ACTGTTACTG 
6TTTTACGTG 
AATCCAAACA 
GATAATTCCG 
TTTAAAATTA 
TCTAATACTT 
AGTGCACCTA 
ACTGACCAGA 
TTTTCATTTG 
CTCACCTCTG 
GGGCTATCAG 
ATTCTTACGC 
ACTGGTCGTG 
CAAAAT6TAG 
CTGGATATTA 
ACTAATCAAA 
GGTGGCCTCA 
ATCCCTTTAA 
TACGTGCTCG 
TGTGGTG6TT 
CGCTTTCTTC 
6GGGCTCCCT 
TTTGGGT6AT 
GTT6GAGTCC 
TATCTCGGGC 
CAGGATTTTC 
CAG6CGGT6A 
GCGCCCAATA 
CGACAGGTTT 
CACTCATTAG 
TGTGA6C6GA 
GTAGGAGAGC 
A6TTTACAG6 
GTT6GTGCTA 
GCTGGCGTAA 
ATGGC6AATG 
AGTGCGATCT 
ACGAT6CGCC 
CCACGGA6AA 
AGGAAGGCCA 
TTAACAAAAA 
TTATACAATC 
CATGCTAGTT 
TGACCTGATA 
AGCTAGAACG 
TTT TGAAT CT 
AAATTTTTAT 
TGTniTGGT 
TTCTTTGCCT 
I 10 



ATTCTTATTT 
AGAAGATGAA 
TTGGATTTGC 
AGGTAGTCTC 
ATCTAAGCTA 
TACAGAAGCA 
TAATTCAAAT 
CTTCTTTTGC 
ATTCAAA6CA 
TATATTCATC 
CTAATAATTT 
ATCAG6ATTA 
CTCCTTCTGG 
ATAACGHCG 
CTAAATCCTC 
AAGATATTTT 
TATTGATTGA 
CTGCTGGCTC 
TTTTATCTTC 
TTC6CGCATT 
TTTCAGGTCA 
TGACTGGTGA 
GTATTTCCAT 
CCAGCAA6GC 
GAA6TATTGC 
CTGATTATAA 
TCGGCCTCCT 
TCAAAGCAAC 
ACGCGCAGCG 
CCTTCCTTTC 
TTAGGGnCC 
GGTTCAC6TA 
ACGTTCTTTA 
TATTCTTTTG 
GCCTGCT6GG 
AGGGCAATCA 
CGCAAACCGC 
CCC6ACTGGA 
GCACCCCAGG 
TAACAATTTC 
TCGGCGGATC 
CAAGTGCTAC 
CCATAGGGAT 
TAGCGAA6AG 
GCGCTTTGCC 
TCCTGAGGCC 
CATCTACACC 
TCCGACGGGT 
GACGCGAATT 
TTTAACGCGA 
TTCCTGTTTT 
mCGATTAC 
GCCTTTGTAG 
GTTGAATATC 
TTACCTACAC 
CCTTGCGTTG 
ACAACCGATT 
TGCCTGTATG 
I 20 



AACGCCTTAT 
GCTTACTAAA 
ATCAGCATTT 
TCAGACCTAT 
TCGCTATGTT 
AGGTTATTCA 
GAAATTGTTA 
TCAGGTAATT 
ATCAG6CGAA 
TGACGTTAAA 
TGATATGGn 
TATT6ATGAA 
TGGTTTCTTT 
GGCAAAGGAT 
AAATGTATTA 
AGATAACCTT 
GGGTTTGATA 
TCAGCGTGGC 
TGCTGGTGGT 
AAAGACTAAT 
GAAGGGTTCT 
ATCTGCCAAT 
GAGCGTTTTT 
CGATAGTTTG 
TACAAC6GTT 
AAACACTTCT 
GHTAGCTCC 
CATAGTAC6C 
TGACCGCTAC 
TCGCCACGTT 
GATTTAGTGC 
GTGGGCCATC 
ATAGTGGACT 
ATTTATAAGG 
GCAAACCAGC 
GCTGTTGCCC 
CTCTCCCCGC 
AAGCGGGCAG 
CTTTACACTT 
ACACAGGAAA 
CTAGGCTGAA 
T6A6TACATT 
TAAATTATTC 
GCCCGCACCG 
TGGTTTCCGG 
GATACGGTCG 
AACGTAACCT 
TGTTACTCGC 
ATTTTTGATG 
ATTTTAACAA 
TGGGGCTTTT 
CGTTCATCGA 
ATCTCTCAAA 
ATATTGATGG 
ATTACTCAGG 
AAATAAAGGC 
TAGCTTTATG 
ATTTATTGGA 
I 30 



TTATCACACG 
ATATATTTGA 
ACATATAGTT 
GATTTTGATA 
TTCAAGGATT 
CTCACATATA 
AATGTAATTA 
GAAATGAATA 
TCCGTTAnG 
CCTGAAAATC 
GGTTCAATTC 
TTGCCATCAT 
6TTCCGCAAA 
TTAATACGAG 
TCTATTGACG 
CCTCAATTCC 
TTTGAGGTTC 
ACTGTTGCAG 
TCGTTCGGTA 
AGCCATTCAA 
ATCTCTGTTG 
GTAAATAATC 
CCTGTT6CAA 
AGTTCTTCTA 
AATTTGCGTG 
CAAGATTCTG 
CGCTCTGATT 
GCCCTGTAGC 
ACTTGCCAGC 
CGCCGGCTTT 
TTTACGGCAC 
GCCCTGATAG 
CTTGTTCCAA 
GATTTTGCCG 
GTGGACCGCT 
GTCTCGCTGG 
GCGTTGGCCG 
T6AGC6CAAC 
TATGCTTCCG 
CAGCTATGAC 
GGCGATGACC 
6GCTACGCTT 
AAAAAGTTTA 
ATCGCCCTTC 
CACCAGAAGC 
TCGTCCCCTC 
ATCCCATTAC 
TCACATTTAA 
GCGTTCCTAT 
AATATTAACG 
CTGATTATCA 
TTCTCTTGTT 
AATAGCTACC 
TGATTTGACT 
CATTGCATTT 
TTCTCCCGCA 
CTCTGAGGCT 
CGH 

I 10 



6TCGGTATTT 
AAAAGTTTTC 
ATATAACCCA 
AATTCACTAT 
CTAAGGGAAA 
TTGATTTATG 
ATTTTGTTTT 
ATTCGCCTCT 
TTTCTCCCGA 
TACGCAATTT 
CTTCCATTAT 
CT6ATAATCA 
ATGATAATGT 
TTGTCGAATT 
GCTCTAATCT 
TTTCTACTGT 
AGCAAGGTGA 
GCGGTGTTAA 
TTTTTAATGG 
AAATATT6TC 
GCCAGAATGT 
CATTTCAGAC 
TGGCTGGCGG 
CTCA6GCAAG 
ATGGACAGAC 
6CGTACCGTT 
CCAACGAGGA 
GGCGCATTAA 
GCCCTAGCGC 
CCCCGTCAAG 
CTCGACCCCA 
ACGGTTTTTC 
ACTG6AACAA 
ATTTCGGAAC 
TGCTGCAACT 
TGAAAAGAAA 
ATTCATTAAT 
GCAATTAATG 
GCTC6TATGT 
CAG6ATGTAC 
CTGCTAAGGC 
GGGCTATGGT 
CGAGCAAGGC 
CCAACGATTG 
GGTGCCGGAA 
AAACTGGCAG 
GGTCAATCCG 
TGHGATGAA 
TGGTTAAAAA 
TTTACAATTT 
ACCGGGGTAC 
TGCTCCAGAC 
CTCTCCGGCA 
GTCTCCGGCC 
AAAATATATG 
AAAGTATTAC 
TTATTGCTTA 

I 50 



CAAACCAT7A 
ACGCGTTCTT 
ACCTAAGCCG 
TGACTCTTCT 
ATTAATTAAT 
TACT6TTTCC 
CTT6ATGTTT 
GCGCGATTTT 
TGTAAAAGGT 
CTTTATTTCT 
TTAGAAGTAT 
6GAATATGAT 
TACTCAAACT 
6TTTGTAAAG 
ATTAGTTGTT 
TGATTTGCCA 
TGCTTTAGAT 
TACTGACCGC 
CGATGTTTTA 
TGTGCCACGT 
CCCTTTTATT 
GATTGAGCGT 
TAATATTGTT 
TGATGTTATT 
TCmiACTC 
CCTGTCTAAA 
AAGCACGTTA 
GCGCGGCGGG 
CCGCTCCin 
CTCTAAATCG 
AAAAACTTGA 
GCCCTTTGAC 
CACTCAACCC 
CACCATCAAA 
CTCTCAGGGC 
AACCACCCTG 
GCA6CTGGCA 
TGAGTTAGCT 
TGTGTGGAAT 
GAATTC6CAG 
TGCATTCAAT 
AGTAGTTATA 
TTCTTAACCA 
CGCAGCCTGA 
A6CTGGCTGG 
ATGCACGGTT 
CCGTTTGTTC 
AGCTGGCTAC 
AT6AGCTGAT 
AAATATTTGC 
ATATGATTGA 
TCTCAGGCAA 
TTAATTTATC 
TTTCTCACCC 
AGGGTTCTAA 
AG6GTCATAA 
ATTTTGCTAA 

I 50 
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I 10 i 20 i 30 \ ^0 I 50 ; 50 
1 AATGCTACTA CTATTAGTA6 AATTGATGCC ACCTTTTCA6 CTCGCGCCCC AAATGAAAAT 50 
51 ATAGCTAAAC AG6TTATTGA CCATTTGCGA AATGTATCTA ATG6TCAAAC TAAATCTACT 120 
121 C6TTCGCA6A ATTGG6AATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACA6 CACCAGATTC A6CAATTAA6 CTCTAAGCCA 240 
2^1 TCTGCAAAAA TGACCTCTTA TCAAAAGGA6 CAATTAAAG6 TACTCTCTAA TCCTGACCTG 300 
191 TTG6A6TTT6 CTTCCGGTCT 66TTCGCTTT GAA6CTCGAA TTAAAACGC6 ATATTTGAAG 350 
361 TCTTTCGG6C TTCCTCTTAA TCTTTTT6AT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 
421 CAG6GTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT 6TTTAAAGCA 480 
481 ITTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG iATTGGACGC TATCCAGTCT 51^0 
5^1 AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TC6CTATTTT 600 
§91 §5TTTTTATC GTCGTCTGGT AAACGAGG6T TATGATAGTG TTGCTCTTAC TATGCCTC6T 550 
661 AATTCCTTTT GGC6TTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CC6TTAGTTC GTTTTATTAA CGTA6ATTTT 780 
781 TCTTCCCAAC 6TCCTGACTG 6TATAATGA6 CCA6TTCTTA AAATCGCATA AGGTAATTCA 840 
841 CAATGATTAA AGTT6AAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
^9} 9ICGTCAGGG CAAGCCTTAT 7CACTGAATG AGCAGCTTT6 TTACGTTGAT TTGG6TAATG 950 
961 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT 6CGCCTGGTC 1020 
1021 TGTACACC6T TCATCTGTCC TCTTTCAAAG TT66TCA6TT CG6TTCCCTT AT6ATT6ACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG 6AGCAGGTCG CGGATTTC6A CACAATTTAT 1140 
1141 CAGGCGAT6A TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
1291 CAAAGATGAG TGTTTTA6TG TATTCTTTCG CCTCTTTCGT TTTAG6TTGG TGCCTTCGTA 1260 
1261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC AT6AAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTA6CCGTTG CTACCCTCGT TCCGATGCTG TCTTTC6CTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATC6GTTA 1440 
l^^l T6C6TGGGCG ATGGTT6TTG TCATTGTC6G CGCAACTATC G6TATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 
1561 TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTC6CAA TTCCTTTA6T TGTTCCTTTC 1620 
1621 TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 TTTACTAACG TCT6GAAAGA CGACAAAACT TTA6ATC6TT ACGCTAACTA TGA6G6TT6T 1740 
1741 CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCA6TG TTACGGTACA 1800 
l§91 T^GGTTCCTA TTG6GCTT6C TATCCCTGAA AATGA6G6TG 6TG6CTCTGA GGGTGGCG6T 1850 
I^IJ I™G6GT6 GC6GTTCT6A GGGT66CGGT ACTAAACCTC CTGAGTACGG T6ATACACCT. 1920 
1921 ATTCCG66CT..AIACTTATAT CAAGGCTGTG GACGGC-ACTT ATCCGCCTGG TACTGAGCAA 1980 
-1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GA6TCTCAGC CTCTTAATAC TTTCATGTTT 2040 
?9^1 CAGAATAATA GGTTCC6AAA TA66CAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 
2101 CAAGGCACT6 ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCTT ACT66AACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TG6TTCTGGT GGCGGCTCTG AGGGTGGT6G CTCTGAGGGT 2340 
2341 GGCGGTTCTG A6GGTGGC6G CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 
^?91 9^11119^11 ATGAAAAGAT GGCAAACGCT AATAAGG6GG CTATGACCGA AAATGCCGAT 2460 
2^61 GAAAAC6CGC TACAGTCTGA C6CTAAAGGC AAACTTGATT CTGTC6CTAC TGATTAC6GT 2520 
2521 GCT6CTATC6 ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 
2641 TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
2701 TTT6TCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATT6 ATTGTGACAA AATAAACTTA 2760 
2761 TTCC6TGGTG TCTTTGCGTT TCTTTTATAT 6TTGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 
2881 TATTATTGCG TTTCCTCGGT TTCCTTCT6G TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 
2941 TTAAAAAGGG CTTCGGTAAG ATA6CTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 
5091 GGCTTAACTC AATTCTTGTG GGTTATCTCT CT6ATATTA6 CGCTCAATTA CCCTCTGACT 3060 
|061 TT6TTCAGG6 TGTTCA6TTA ATTCTCCCGT 6TAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 
3181 ATT6GGATAA ATAATAT6GC TGTTTATTTT GTAACT6GCA AATTAGGCTC TGGAAAGACG 3240 
3241 CTCGTTAGCG TTGGTAAGAT TTAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTT6ATTTAA 66CTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 
3351 CTTA6AATAC C6GATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 
3421 TCCTACGAT6 AAAATAAAAA CGGCTTGCH GTTCTCGATG AGTGCGGTAC TT6GTTTAAT 3480 
3481 ACCC6TTCTT 66AAT6ATAA GGAAAGACAG CCGATTATTG ATTG6TTTCT ACATGCTCGT 3540 
3|^»1 AAATTA6GAT G66ATATTAT CTTCCTTGTT CA6GACTTAT CTATTGTTGA TAAACAGGC6 3600 
3601 CGTTCTGCAT TA6CTGAACA TGTTGTTTAT TGTCGTCGTC TG6ACAGAAT TACTTTACCT 3660 
3551 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 6TTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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TCCGGTG" 
AATTTAGGTC 
TGTCTTGCGA 
6AGGTTAAAA 
CAGCGTCTTA 
AGC6ACGATT 
ATTAAAAAAG 
TGTTTCATCA 
T6TAACTTGG 
TACTGTTACT 
TGTTTTACGT 
TAATCCAAAC 
TGATAATTCC 
TTTTAAAATT 
GTCTAATACT 
TAGTGCACCT 
AACTGACCAG 
TTTTTCATTT 
CCTCACCTCT 
AGG6CTATCA 
TATTCTTACG 
TACTGGTC6T 
TCAAAATGTA 
TCTGGATATT 
TACTAATCAA 
CGGT66CCTC 
AATCCCTHA 
ATAC6T6CTC 
GTGTGGTGGT 
TCGCTTTCn 
G6G66CTCCC 
ATTTG6GTGA 
CGTTGGAGTC 
CTATCTCGGG 
ACAGGATTTT 
CCAGGCGGTG 
G6CGCCCAAT 
ACGACAGGTT 
TCACTCATTA 
TTGTGAGCGG 
TACGGCAGCC 
6ACCCA6ACT 
ACTGGCCGTC 
CCTTGCA6CA 
CCCTTCCCAA 
AGAAGCGGTG 
CCCCTCAAAC 
CATTACGGTC 
ATTTAATGTT 
TCCTATTGGT 
TTAACGTTTA 
TTATCAACCG 
CTTGTTT6CT 
GCTACCCTCT 
TTGACTGTCT 
GCATTTAAAA 
CCCGCAAAAG 
GA66CTTTAT 
I 10 



ATTCTTATTT 
AGAAGATGAA 
TTGGATTTGC 
AGGTAGTCTC 
ATCTAAGCTA 
TACAGAAGCA 
GTAATTCAAA 
TCTTCTTTTG 
TATTCAAAGC 
GTATATTCAT 
GCTAATAATT 
AATCAGGATT 
GCTCCTTCTG 
AATAACGTTC 
TCTAAATCCT 
AAAGATATTT 
ATATTGATTG 
GCTGCTGGCT 
6TTTTATCTT 
6TTCGCGCAT 
CTTTCAGGTC 
GTGACTGGTG 
GGTATTTCCA 
ACCAGCAA6G 
AGAAGTATT6 
ACTGATTATA 
ATCGGCCTCC 
GTCAAAGCAA 
TACGCGCA6C 
CCCTTCCTTT 
TTTAGG6TTC 
TGGnCACGT 
CACGTTCTTT 
CTATTCTTTT 
CGCCTGCTGG 
AAGGGCAATC 
ACGCAAACCG 
TCCCGACT6G 
GGCACCCCAG 
ATAACAATTT 
GCIGGAHGT 
CCA6AATTCC 
GTHTACAAC 
CACCCCCCTT 
CAGTTGCGCA 
CC6GAAAGCT 
TGGCA6ATGC 
AATCCGCCGT 
6AT6AAAGCT 
TAAAAAATGA 
CAATTTAAAT 
GGGTACATAT 
CCAGACTCTC 
CCGGCATTAA 
CCGGCCTTTC 
TATATGA6G6 
TATTACAGGG 
T6CTTAATTT 
I 20 



AACGCCTTAT 
ATTAACTAAA 
ATCAGCATTT 
TCAGACCTAT 
TCGCTATGTT 
AGGTTATTCA 
TGAAATTGTT 
CTCAGGTAAT 
AATCAGGCGA 
CTGACGTTAA 
TTGATATGGT 
ATATTGATGA 
GTG6TTTCTT 
GGGCAAAGGA 
CAAAT6TATT 
TAGATAACCT 
AGGGTTTGAT 
CTCAGCGT6G 
CTGCTGGTGG 
TAAAGACTAA 
AGAAGGGTTC 
AATCTGCCAA 
TGAGCGTTTT 
CCGATAGTTT 
CTACAACGGT 
AAAACACTTC 
T6TTTAGCTC 
CCATAGTACG 
GTGACCGCTA 
CTC6CCAC6T 
CGATTTAGTG 
A6TGGGCCAT 
AATAGTG6AC 
GATHATAAG 
6GCAAACCA6 
AGCTGTTGCC 
CCTCTCCCCG 
AAAGCGGGCA 
GCTTTACACT 
CACAC6CCAA 
TATTACTCGC 
ATCCG6AATG 
6TCGTGACT6 
TCGCCAGCTG 
GCCTGAAT6G 
GGCTGGAGTG 
ACGGTTACGA 
TTGTTCCCAC 
GGCTACA6GA 
GCTGATTTAA 
ATTTGCTTAT 
GATTGACAT6 
A6GCAATGAC 
TTTATCAGCT 
TCACCCTTTT 
TTCTAAAAAT 
TCATAATGTT 
TGCTAATTCT 
I 30 



TTATCACACG 
ATATATTT6A 
ACATATAGTT 
6ATTTTGATA 
TTCAA6GATT 
CTCACATATA 
AAATGTAATT 
TGAAATGAAT 
ATCCGTTATT 
ACCTGAAAAT 
TGGTTCAATT 
ATTGCCATCA 
TGTTCCGCAA 
TTTAATACGA 
ATCTATTGAC 
TCCTCAATTC 
ATTTGAGGTT 
CACTGTTGCA 
TTCGTTCGGT 
TAGCCATTCA 
TATCTCTGTT 
TGTAAATAAT 
TCCTGTTGCA 
GAGTTCTTCT 
TAATTTGCGT 
TCAAGATTCT 
CC6CTCTGAT 
CGCCCTGTAG 
CACTTGCCAG 
TCGCCGGCTT 
CTTTACGGCA 
CGCCCTGATA 
TCTTGTTCCA 
GGATTTTGCC 
CGTGGACCGC 
CGTCTC6CTG 
CGCGTTGGCC 
GTGAGCGCAA 
TTATGCTTCC 
GGAGACAGTC 
TGCCCAACCA 
AGTGTTAATT 
GGAAAACCCT 
GCGTAATAGC 
CGAATG6CGC 
CGATCTTCCT 
TGCGCCCATC 
GGAGAATCCG 
AGGCCAGACG 
CAAAAATTTA 
ACAATCTTCC 
CTAGTTTTAC 
CTGATAGCCT 
AGAACGGTTG 
GAATCTTTAC 
TTTTATCCTT 
TTTGGTACAA 
TTGCCTTGCC 
1 40 



GTCGGTATTT 
AAAAGTTTTC 
ATATAACCCA 
AATTCACTAT 
CTAAGGGAAA 
TTGATTTATG 
AATTTT6TTT 
AATTCGCCTC 
GTTTCTCCCG 
CT.AC6CAATT 
CGTTCCATAA 
TCT6ATAATC 
AATGATAAT6 
GTTGTCGAAT 
GGCTCTAATC 
CTTTCTACTG 
CAGCAAGGTG 
GGCGGTGTTA 
ATTTTTAATG 
AAAATATTGT 
GGCCAGAATG 
CCATTTCAGA 
ATGGCTGGCG 
ACTCAGGCAA 
GATGGACAGA 
GGCGTACCGT 
TCCAACGAGG 
CGGCGCATTA 
CGCCCTAGCG 
TCCCCGTCAA 
CCTCGACCCC 
GACGGTTTTT 
AACTGGAACA 
GATTTCG6AA 
TTGCTGCAAC 
GTGAAAAGAA 
GATTCATTAA 
CGCAATTAAT 
GGCTCGTATG 
ATAATGAAAT 
GCCATGGCCG 
CTAGAACGCG 
GGCGTTACCC 
GAAGAGGCCC 
TTTGCCTG6T 
GAGGCCGATA 
TACACCAACG 
TCGGGTTGTT 
CGAATTATTT 
AC6CGAATTT 
TGTTTTTGGG 
GATTACCGTT 
TTGTAGATCT 
AATATCATAT 
CTACACATTA 
GCGTTGAAAT 
CCGATTTAGC 
TGTATGATTT 
! 50 



CAAACCATTA 
TCGCGTTCTT 
ACCTAA6CC6 
TGACTCTTCT 
AiTAATTAAT 
TACTGTTTCC 
TCTTGATGTT 
TGCGCGATTT 
ATGTAAAAGG 
TCTTTATTTC 
TTCAGAAGTA 
AG6AATATGA 
TTACTCAAAC 
TGTTTGTAAA 
TATTAGTTGT 
TT6ATTTGCC 
ATGCTTTA6A 
ATACTGACCG 
GCGATGTTTT 
CTGTGCCACG 
TCCCTTTTAT 
CGATTGAGCG 
GTAATATTGT 
GTGATGTTAT 
CTCTTTTACT 
TCCTGTCTAA 
AAAGCACGTT 
AGCGCGGCGG 
CCCGCTCCTT 
GCTCTAAATC 
AAAAAACTTG 
CGCCCTTTGA 
ACACTCAACC 
CCACCATCAA 
TCTCTCAGGG 
AAACCACCCT 
TGCAGCTGGC 
GTGAGTTAGC 
TTGTGTGGAA 
ACCTATTGCC 
AGCTCGTGAT 
TAAGCTTGGC 
AACTTAATCG 
GCACCGATCG 
TTCCGGCACC 
CGGTCGTCGT 
TAACCTATCC 
ACTCGCTCAC 
TTGATGGCGT 
TAACAAAATA 
GCTTTTCTGA 
CATCGATTCT 
CTCAAAAATA 
TGATGGTGAT 
CTCAGGCATT 
AAAGGCTTCT 
TTTATGCTCT 
ATTGGACGTT 
I 60 
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i 10 
1 AAT6CTACTA 
51 ATA6CTAAAC 
121 CGTTC6CA6A 
181 GTT6CATATT 
2m TCT6CAAAAA 
301 TIGGAGTHG 
351 TCTTTC6GGC 
421 CAGGGTAAAG 
A81 TTT6AGGG6G 
541 AAACATTTTA 
601 66TTTTTATC 
651 AATTCCTHT 
721 ATGAATCTTT 
781 TCTTCCCAAC 
841 CAATGATTAA 
901 CTC6TCAGG6 
961 AATATCCGGT 
1021 TGTACACC6T 
1081 GTCTGCGCCT 
1141 CAG6CGATGA 
1201 CAAAGATGAG 
1251 GTGGCATTAC 
1321 CAAAGCCTCT 
1381 CGATCCCGCA 
1441 TGCGTGGGCG 
1501 ATTCACCTCG 
1561 TTTTT6GAGA 
1621 TATTCTCACT 
1581 TTTACTAACG 
1741 CTGTGGAATG 
1801 TGGGTTCCTA 
1851 TCT6AGGGT6 
1921 ATTCCGGGCT 
1981 AACCCCGCTA 
2041 CA6AATAATA 
2101 CAAGGCACTG 
2161 TATGACGCTT 
2221 GATCCATTCG 
2281 GCTGGCGGC6 
2341 GGCGGTTCT6 
2401 GATTTTGATT 
2461 GAAAACGCGC 
2521 GCTGCTATCG 
2581 GGIGATTHG 
2641 TTAAT6AATA 
2701 TTTGTCTTTA 
2761 nCCGTGGTG 
2821 TTTGCTAACA 
2881 TATTATTGC6 
2941 TTAAAAAGGG 
3001 GGCTTAACTC 
3061 TTGTTCA6G6 
3121 TCTCTGTAAA 
3181 ATT6G6ATAA 
3241 CTC6TTAGCG 
3301 CTTGATHAA 
3361 CTTAGAATAC 
3421 TCCTACGAT6 
3481 ACCCGTTCTT 
3541 AAATTAGGAT 
3601 CGTTCTGCAT 
3651 TTTGTCGGTA 
3721 GTTGGC6TT6 



! 20 
CTATTAGTAG 
AGGTTATTGA 
ATTGGGAATC 
TAAAACATGT 
T6ACCTCTTA 
CTTCCGGTCT 
TTCCTCTTAA 
ACCTGATTTT 
ATTCAATGAA 
CTATTACCCC 
GTCGTCT6GT 
GGCGTTATGT 
CTACCTGTAA 
GTCCTGACTG 
AGTTGAAATT 
CAA6CCTTAT 
TCTTGTCAAG 
TCATCTGTCC 
CGTTCCG6CT 
TACAAATCTC 
TGTTTTAGTG 
6TATTTTACC 
GTAGCCGTT6 
AAA6CGGCCT 
ATGGTTGTTG 
AAAGCAA6CT 
TTTTCAACGT 
CCGCTGAAAC 
TCTGGAAAGA 
CTACA6GCGT 
TTGGGCTTGC 
GCG6TTCTGA 
ATACTTATAT 
ATCGTAATGC 
GGTTCCGAAA 
ACCCCGTTAA 
ACTGGAACGG 
TTTGTGAATA 
GCTCTGGTGG 
AGGGTGGCGG 
ATGAAAAGAT 
TACAGTCTGA 
ATGGTTTCAT 
CTGGCTCTAA 
ATTTCCGTCA 
GCGCT6GTAA 
TCTTTGCGTT 
TACTGCGTAA 
TTTCCTCGGT 
CTTCGGTAAG 
AATTCTTGTG 
TGTTCAGTTA 
GGCTGCTATT 
ATAATATGGC 
TTGGTAAGAT 
GGCTTCAAAA 
CGGATAAGCC 
AAAATAAAAA 
GGAATGATAA 
GGGATATTAT 
TAGCTGAACA 
CTTTATATTC 
TTAAATATGG 



i 30 
AATTGATGCC 
CCATTTGCGA 
AACT6TTACA 
TGAGCTACAG 
TCAAAAGGAG 
G6TTCGCTTT 
TCTTTTTGAT 
TGATTTATGG 
TATTTAT6AC 
CTCTGGCAAA 
AAACGAGGGT 
ATCTGCATTA 
TAATGTTGTT 
6TATAATGAG 
AAACCATCTC 
TCACTGAATG 
ATTACTCTTG 
TCTTTCAAAG 
AAGTAACATG 
C6TTGTACTT 
TATTCTTTCG 
C6TTTAATGG 
CTACCCTCGT 
TTAACTCCCT 
TCATTGTCGG 
GATAAACCGA 
GAAAAAATTA 
TGTTGAAAGT 
CGACAAAACT 
TGTAGTTTGT 
TATCCCT6AA 
6GGTGGC6GT 
CAACCCTCTC 
TTCTCTTGA6 
TAGGCA6GGG 
AACTTATTAC 
TAAATTCAGA 
TCAAG6CCAA 
TGGTTCTG6T 
CTCTGA6G6A 
GGCAAACGCT 
CGCTAAAGGC 
TGGTGACGTT 
TTCCCAAATG 
ATATTTACCT 
ACCATAT6AA 
TCTTTTATAT 
TAAGGAGTCT 
TTCCTTCTGG 
ATAGCTATTG 
GGTTATCTCT 
ATTCTCCCGT 
TTCATTTTTG 
TGTTTATTTT 
TCA6GATAAA 
CCTCCCGCAA 
TTCTATATCT 
CGGCTTGCTT 
GGAAAGACAG 
TTTTCTTGTT 
TGTTGTTTAT 
TCTTATTACT 
CGATTCTCAA 



! 40 
ACCTTTTCA6 
AATGTATC7A 
TG6AAT6AAA 
CACCAGATTC 
CAATTAAAGG 
6AAGCTCGAA 
GCAATCCGCT 
TCATTCTC6T 
6ATTCCGCA6 
ACTTCTTTTG 
TATGATAGTG 
GTTGAATGTG 
CCGTTAGTTC 
CCAGTTCTTA 
AAGCCCAATT 
AGCAGCTTTG 
ATGAAGGTCA 
TTGGTCAGTT 
GAGCAGGTC6 
TGTTTCGCGC 
CCTCTTTCGT 
AAACTTCCTC 
TCC6ATGCTG 
GCAAGCCTCA 
CGCAACTATC 
TACAATTAAA 
TTATTCGCAA 
TGTTTA6CAA 
TTAGATC6TT 
ACT6GTGACG 
AATGAGGGTG 
ACTAAACCTC 
GACGGCACXT 
GAGTCTCAGC 
GCATTAACTG 
CAGTACACTC 
6ACTGCGCTT 
TCGTCTGACC 
GGCGGCTCTG 
GGCGGTTCCG 
AATAA6GGGG 
AAACTTGATT 
TCCGGCCTT6 
GCTCAAGTCG 
TCCCTCCCTC 
TTTTCTATTG 
GTT6CCACCT 
TAATCATGCC 
TAACTTTGTT 
CTATTTCATT 
CT6ATATTAG 
CTAATGCGCT 
ACGTTAAACA 
GTAACTGGCA 
ATTGTAGCTG 
GTCGGGAGGT 
GATTTGCTTG 
GTTCTCGATG 
CCGATTATTG 
CAGGACTTAT 
TGTCGTCGTC 
GGCTCGAAAA 
TTAAGCCCTA 



50 

CTCGCGCCCC 

ATGGTCAAAC 

CTTCCAGACA 

AGCAATTAA6 

TACTCTCTAA 

TTAAAACGCG 

TTGCTTCTGA 

TTTCTGAACT 

TATTGGACGC 

CAAAAGCCTC 

TTGCTCTTAC 

GTATTCCTAA 

6TTTTATTAA 

AAATCGCATA 

TACTACTCGT 

TTACGTTGAT 

GCCAGCCTAT 

C6GTTCCCTT 

CGGATTTCGA 

TTGGTATAAT 

TTTAGGTTGG 

ATGAAAAAGT 

TCTTTCGCTG 

GCGACCGAAT 

GGTATCAAGC 

GGCTCCTTTT 

TTCCTTTA6T 

AACCCCATAC 

ACGCTAACTA 

AAACTCAGTG 

GTG6CTCTGA 

CTGAGTACGG 

ATCCGGCTGG 

CTCTTAATAC 

TTTATACGGG 

CT6TATCATC 

TCCATTCTGG 

TGCCTCAACC 

AGGGTGGTGG 

GTGGTGGCTC 

CTATGACCGA 

CTGTCGCTAC 

CTAATGGTAA 

GTGACGGT6A 

AATC6GTTGA 

ATTGTGACAA 

TTATGTATGT 

AGTTCTTTTG 

CGGCTATCTG 

GTTTCTT6CT 

CGCTCAATTA 

TCCCTGHTT 

AAAAATCGTT 

AATTAGGCTC 

GGTGCAAAAT 

TCGCTAAAAC 

CTATTGGGCG 

AGTGCGGTAC 

ATTGGTTTCT 

CTATTGTTGA 

TGGACAGAAT 

TGCCTCTGCC 

CTGTT6AGCG 



i 50 
AAATGAAAAT 
TAAATCTACT 
CCGTACTTTA 
CTCTAAGCCA 
TCCTGACCTG 
ATATTTGAAG 
CTATAATAGT 
6TTTAAAGCA 
TATCCAGTCT 
TCGCTATTTT 
TATGCCTCGT 
ATCTCAACTG 
C6TAGATTTT 
AGGTAATTCA 
TCTGGTGTTT 
TTGGGTAATG 
GCGCCTGGTC 
ATGATT6ACC 
CACAATTTAT 
CGCTG6GGGT 
TGCCTTCGTA 
CTTTAGTCCT 
CTGAGGGTGA 
ATATCG6TTA 
TGTTTAAGAA 
GGAGCCTTTT 
TGTTCCTTTC 
AGAAAATTCA 
TGA6GGTTGT 
TTACGGTACA 
GGGTGGCGGT 
TGATACACCT 
TACT6AGCAA- 
TTTCATGTTT 
CACTGTTACT 
AAAAGCCATG 
CTTTAATGAA 
TCCTGTCAAT 
CTCTGAGGGT 
TGGTTCCGGT 
AAATGCCGAT 
TGATTACGGT 
TGGTGCTACT 
TAATTCACCT 
ATGTCGCCCT 
AATAAACTTA 
ATTTTCTACG 
GGTATTCCGT 
CTTACTTTTC 
CHATTATTG 
CCCTCTGACT 
TATGTTATTC 
TCTTATTTGG 
TGGAAAGACG 
AGCAACTAAT 
GCCTCGCGTT 
CG6TAATGAT 
TTGGTTTAAT 
ACATGCTCGT 
TAAACAGGCG 
TACTTTACCT 
TAAATTACAT 
TTGGCTTTAT 
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3781 ACTGGTAAGA ATTT6TATAA CGCATATGAT ACTAAACAG6 CTTTTTCTAG 7AATTATGAT 38^0 

1§m iS^ffi^RJ iJliVMVr ^r^M'MW. IW^.m^^ gtcggtattt caaaccatta 3900 

|901 AATTTAGGTC AGAAGATGAA GCiTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3-60 
5961 TuTCTTGCGA tiggatttgc ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG HG20 

SSiJ ^J^SJfW^ ^i^^^l^^l^jc tcagacctat gattttgata aattcactat TGACTCTTcf mo 

^frfSflW ^IflMSK I^^^I^T^n ™AGGATT CTAAGG6AAA ATTAATTAAT ai?0 

SoS} aS^^^^^IJ '^^^GAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC ii200 

J201 ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT ^250 

;251 luTTTCATCA TCTTCTTTTG CTCA6GTAAT TGAAAT6AAT AATTCGCCTC TGC6CGAT7T i:320 

^321 TGTAACTTGG TATTCAAA6C AATCAG6CGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

J381 TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC WSo 

^^i^l TbTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCGATAA TTCA6AA6TA i<500 

il^] If KK^^^ ATATTGAT6A ATT6CCATCA TCT6ATAATC A6GAATATGA 4550 

J551 TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCC6CAA AATGATAATG TTACTCAAAC 4520 

rTrTAATflJT JM^^^^JJ^ GGGCAAAGGA TTTAATACGA GTTGTC6AAT TGTTTGTAAA 4580 

M] SfrrflfS KI^^4KS ^^^^I^T^II ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

ulm Urr^f^fS JW^^J^ITJ I^GATAACCT TCCTCAATTC CTTTCTACTG TTGATTT6CC 4800 

J801 AACTGACCAG ATATTGAiTG AGGGTTTGAT ATTT6AG6TT CAGCAAGGT6 AT6CTTTAGA 4860 

J861 TTTTTCATTT 6CTGCTGGCT CTCAGC6TG6 CACT6TTGCA GGCGGTGTTA ATACT6ACCG 4920 

;921 CCTCACCTCT GTTTTATCTT CTGCTG6TG6 TTCGTTC6GT ATTTTTAATG 6CGAT6TTTT 4980 

^^81 AbGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

50^ I5RCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 5100 

Mfii TrflAASf^I ^KKT^H^ AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGA6CG 5160 

11^] TCAAAATGTA GGTATTiCCA TGA6CGTTTT TCCTGTTGCA AT6GCTGGCG GTAATATTGT 5220 

ifif Kr^SKfll JffS^M^P P^^^I^^III 6AGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

III] rffrffS?? S^^fSilT^ ^I^^M^^^I IMFTGCGT GAT6GACAGA CTCTTTTACT 5340 

c/ot V9§IGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

lufa S^Iff^W? SfKffJ^^ J^III^^^JC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

5J61 ATACGT6CTC GTCAAAGCAA CCATAGTAC6 CGCCCTGTAG C6GCGCATTA AGC6C6GCGG 5520 

5521 6T6TGGTGGT JACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 5580 

ill] TC6CTTTCTT CCCTTCCTTT CTC6CCAC6T TCGCC6GCTT TCCCC6TCAA GCTCTAAATC 5640 

q7ni flTTTrrrrrA rrrr^PSK 9^^111^91$ CTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 
5701 ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACG6TTTTT CGCCCTTTGA 5750 

qR9i fxlTrTfrrf ptattJHH ^^T^^I^^^^ JCTTGTTCCA AACTGGAACA ACACTCAACC 5820 

^11] iWrrliTTT rJrrlPrUJ P^JII^J^^^ G6ATTTTGCC GATTTCGGAA CCACCATCAA 5880 

lol] ffirrfrrrr GGCAAACCAG C6T66ACCGC TTGCTGCAAC TCTCTCAG6G 5940 

Inm rrrrrr^^l^ Jf^^K^^I^ AGCTGTT6CC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 5000 

finfii ArfflfnrrTT CGCGTTGGCC GATTCATTAA TGCAGCTGGC 6050 

Trflfrf A?II JfffPfffi^ AAAGCGGGCA GTGA6CGCAA CGCAATTAAT GTGAGTTAGC 5120 

6121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC G6CTCGTATG TTGTGTGGAA 6180 

thl] rrrlrrrrr^ SlaffWffJ f^^^P^^^I^ ACTTG6CACT GGCCGTCGTT TTACAACGTC 5240 

fi^m SrrArTA^T rf ArrSPSf ^TTACCCAAG CTTTGTACAT GGAGAAAATA AAGTGAAACA 6300 

119] AAGCACTATT GCACTGGCAC TCTTACCGTT ACC6TTACTG TTTACCCCTG TGACAAAAGC 5350 

11?] frSffrTrlA rfrfl^flf^ fSf^^^PPJ ATTGT6CCCA GGGGATTGTA CTAGTGGATC 5420 

tbi] TrflrTAPATT rrfrflS^^^ S^^I^^^^^ IGCATTCAAT AGTTTACAGG CAAGTGCTAC 5480 

rrSi taaattaSJ PP9PJ^T9^J AGTAGTTATA GTTGGTGCTA CCATAGGGAT 6540 

tin] cMJWrl^ AAAAAGTTTA CGA6CAA6GC TTCTTAAGCA ATAGC6AAGA GGCCCGCACC 5600 

fififii rflrrAr Sir fffW^fSI AAT6GCGAAT GGCGCTTTGC CTGGTTTCCG 5550 

6661 GCACCAGAAG CGGTGCCG6A AA6CTG6CT6 GAGTGCGATC TTCCTGAGGC CGATACGGTC 5720 

fi7Ri TflTrrrflTTl frWHS^f? M^M^r^.^^ CCATCTACAC CAACGTAACC 5780 

ill] rrrArATTTA f^PS^^IP^ GCCGTTTGTT CCCACGGAGA ATCCGACGGG TTGTTACTCG 6840 

Mm frff^ffiW T^ISW^I^^ ^^5fI59?I'J CA6GAAGGCC AGACGCGAAT TATTTTTGAT 6900 

MM AAATATTAlf Hf^IJJ?^^ AAT6AGCTGA TTTAACAAAA ATTTAACGCG AATTTTAACA 5960 

6961 AAATATTAAC GTTTACAATT TAAATATTTG CTTATACAAT CnCCTGTTT TTGGGGCTTT 7020 

19,1] IxTrTrTTrT ttS^^^SS 9^WMJl^ ACATGCTAGT TTTACGATTA CCGTTCATCG 7080 

191] JIKIfrTAr rr^Sfff^? fSK^^^^^ ^I^^^^I^'^T AGCCTTTGTA GATCTCTCAA. 7140 

71^1 AAATAGCTAC CCTCTCCGGC ATTAATTTAT CA6CTAGAAC GGTTGAATAT CATATTGATG 7200 

79fii rrSiTrrATT taIaK5^5^ SJJfH^^9 CTTTTGAATC TTTACCTACA CATTACTCAG 7250 

7261 GCATTGCATT TAAAATATAT GAG6GTTCTA AAAATTTTTA TCCTTGCGTT GAAATAAAGG 7320 

7321 CTTCTCCC6C AAAAGTATTA CA66GTCATA AT6TTTTTGG TACAACCGAT TTAGCTTTAT 7380 

7^} aStt^'^^'''^ TTTATTGCTT AATTTTGCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG 7440 

I 10 I 20 I 30 I ilO I 50 ! 60 
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I 10 ! 20 
AATGCTACTA CTATTAGTAG 
ATAGCTAAAC AGGTTATTGA 
CGTTCGCAGA ATTGGGAATC 
GTT6CATATT TAAAACATGT 
TCTGCAAAAA TGACCTCTTA 



1 
61 
121 
181 

241 

301 TT6GAGTTTG 
361 TCTTTC6GGC 
421 CAGG6TAAA6 
481 TTTGAG6GGG 
541 AA ACATT TTA 
G6TTTTTATC 
AAHCC 



601 
661 
721 
781 
841 
901 
951 
1021 
1081 
1141 
1201 
1261 
1321 
1381 
1441 
1501 
1561 
1621 
1681 
1741 
1801 
1861 
1921 
1981 
. 2041. 
2101 
2161 
2221 
2281 
2341 
2401 
2461 
2521 
2581 
2641 
2701 
2761 



i 30 
AATTGATGCC 
CCATTT6CGA 
AACTGTTACA 
TGAGCTACAG 
TCAAAAGGA6 
GGTTCGCTTT 
TCTTTTTGAT 
TGATTTATGG 
TATTTATGAC 
CTCT6GCAAA 
AAAC6AGG6T 
ATCTGCATTA 
TAATGTTGTT 
GTATAATGAG 
AAACCATCTC 



CTTCCGGTCT 
nCCTCTTAA 
ACCTGATTTT 
ATTCAATGAA 
CTATTACCCC 
GTCGTCTGGT 

6GCGTTATGT 

ATGAATCTTT CTACCTGTAA 
TCTTCCCAAC GTCCTGACTG 

CAAT6ATTAA AGTTGAAATT 

CTC6TCAGGG CAAGCCTTAT TCACTGAATG 
AATATCCGGT TCTTGTCAAG ATTACTCTT6 
TGTACACCGT TCATCTGTCC 
GTCTGC6CCT CGTTCCGGCT 
CAGGCGAT6A TACAAATCTC 
CAAAGATGAG TGTTTTAGTG 
GTATTTTACC 
GTAGCCGTTG 
AAAGCGGCCT 
ATGGTTGTTG 
AAAGCAAGCT 



! 50 
CTC6CGCCCC 
AT6GTCAAAC 
CTTCCA6ACA 
AGCAATTAAG 
TACTCTCTAA 
TTAAAACGCG 
TTGCTTCTGA 
TTTCTGAACT 
TATTGGACGC 
CAAAAGCCTC 



TCTTTCAAAG 
AAGTAACATG 



1 40 
ACCTTTTCAG 
AATGTATCTA 
TGGAATGAAA 
CACCAGATTC 
CAATTAAAGG 
GAAGCTCGAA 
GCAATCCGCT 
TCATTCTCGT 
GATTCCGCAG 

ACTTCTTTTG _ 

TATGATAGT6 TTGCTCTTAC 
GHGAATGTG GTATTCCTAA 
CCGTTAGTTC GTTTTATTAA 
CCAGTTCTTA AAATC6CATA 
AAGCCCAATT TACTACTCGT 
AGCAGCTTTG TTACGTTGAT 
ATGAAGGTCA GCCAGCCTAT 
TTG6TCAGTT CG6TTCCCTT 



GTG6CATTAC 
CAAAGCCTCT 
CGATCCCGCA 
TGC6TGGGCG 
ATTCACCTCG 
TTTTTGGAGA 
TATTCTCACT 
TTTACTAACG 
CTGTGGAATG 
TGGGTTCCTA 
TCTGAGGGTG 
ATTCC6G6CT 
AACCCCGCTA 
.CAGAATAATA 
CAAGGCACTG 
TATGACGCTT 
GATCCATTCG 
GCTGGC66CG 
GGCGGTTCTG 



i 50 
AAAT6AAAAT 50 
TAAATCTACT 120 
CCGTACTTTA 180 
CTCTAAGCCA 240 
TCCTGACCTG 300 
ATATTTGAAG 350 
CTATAATAGT 420 
GTTTAAAGCA 480 
TATCCAGTCT 540 
TCGCTATTTT 500 
TATGCCTC6T 550 
ATCTCAACTG 720 
CGTAGATTTT 780 
AGGTAATTCA 840 
TCTGGTGTTT 
TTGGGTAATG 
GCGCCTGGTC 
ATGATTGACC 
CACAATTTAT 
CGCTGGGGGT 
TGCCTTCGTA 
CTTTAGTCCT 
CTGAGGGTGA 
ATATCGGTTA 



TCTGGAAAGA 
CTACAG6C6T 
TTGGGCTTGC 
GCGGTTCT6A 
ATACTTATAT 



ACCCCGTTAA 
ACTGGAACGG 
THGIGAATA 
C6TCTGGTGG 
AGGGTGGC66 



AC6CTAACTA T6AGGGTTGT 
AAACTCAGTG TTAC6GTACA 
GTGGCTCTGA GGGTGGCGGT 
CTGAGTAC6G TGATACACCT 
ATCC6CCTGG TACTGAGCAA 



GAGCAGGTCG CGGATTTCGA 

CGTTGTACTT TGTTTCGCGC TTGGTATAAT 
TATTC7TTCG CCTCTTTCGT TTTAGGTTGG 
CGTTTAATGG AAACTTCCTC ATGAAAAAGT 
CTACCCTCGT TCCGATGCTG TCTTTCGCTG 

TTAACTCCCT GCAAGCCTCA GCGACCGAAT 

TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 
^^^^w^^w^. GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTH 
TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 
CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAAIJCA 
CGACAAAACT TTA6ATCGTT 
TGTAGTTTGT ACT66T6AC6 
TATCCCTGAA AATGAGGGTG 
GGGTGGCGGT ACTAAACCTC 

r.,r.^. .r.,r.. CAACCCTCTC GAC6GCACTT .^^^j^^^ 

ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT, 
GGTTGCGAAA -TAGGGA6GG6 GCATTAACTG- TTTATACGGG CACTGTTACT 
AACTTATTAC CAGTACACTC CTGTATCATC 
TAAATTCAGA GACTGCGCTT TCCATTCTGG 
TCAAGGCCAA TCGTCTGACC TGCCTCAACC 
TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG 
o«v.«u..v-.u nuuv, . ov.v.«u CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGG6 CTATGACCGA AAATGCCGAT 2460 
6AAAAC6CGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTACGGI 2520 
^gg^g^^g^^ TCCGGCCTTG CTAATGGTAA 

TTCCCAAAT6 GCTCAAGTC6 GTGACGGTGA 
ATATTTACCT TCCCTCCCTC AATCGGTTGA 

ACCATATGAA TTTTCTATTG ATTGTGACAA :^,;:f:^>. ^i,^^ 

^/oi ,.uv.«.u«.« .v....«vv... TCTTTTATAT GTTGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGC6TAA TAAGGA6TCT TAATCATGCC AGTTCTTTTG G6TATTCC6T 2880 
2881 TATTATT6CG TTTCCTCG6T TTCCTTCTGG TAACTTTGTT CGGCTATCT6 CTTACTTTTC 2940 
TTAAAAAGG6 CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATT6 3000 



AAAAGCCATG 
CTTTAATGAA 

TCCTGTCAAT . 

CTCTGAGGGT 2340 
TGGTTCCGGT 2400 
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GCTGCTATCG ATG6TTTCAT 
GGTGATHTG CTGGCTCTAA 
TTAATGAATA ATTTCCGTCA 
TTTGTCTTTA GCGCTGGTAA 
TTCCGT6GT6 TCTTT6CGTT 



TGGTGCTACT 2580 
TAATTCACCT 2640 
ATGTCGCCCT 2700 
AATAAACTTA 2760 



2941 - 

3001 GGCTTAACTC AATTCTTGTG 
3061 TTGTTCAGGG TGTTCAGTTA 
3121 TCTCTGTAAA GGCTGCTATT 
3181 ATTGGGATAA ATAATATGGC 
3241 CTCGTTAGCG TTGGTAAGAT 
3301 CTTGATTTAA GGCTTCAAAA 
3361 CTTAGAATAC CGGATAAGCC 
3421 TCCTACGATG AAAATAAAAA 
3481 ACCCGTTCn GGAATGATAA 
3541 AAATTAGGAT GGGATATTAT 
3601 CGTTCTGCAT TAGCTGAACA 
3661 TTTGTCGGTA CTTTATATTC 
3721 GTTGGCGTTG TTAAATATGG 
3781 ACTGGTAAGA ATTTGTATAA 



GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCT6ACT 3060 
ATTCTCCCGT CTAATGC6CT TCCCTGTTTT TATGTTATTC 3120 
TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 
TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG 3240 
TTA6GATAAA ATTGTAGCTG G6TGCAAAAT 
CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC 
TTCTATATCT 6ATTTGCTTG CTATTGGGCG 
CGGCTT6CTT GTTCTCGATG AGTGCGGTAC 
GGAAAGACAG CCGATTATTG ATTGGTTTCT 
TTTTCTTGTT CAGGACTTAT CTATTGTTGA 
TGTTGTrTAT TGTCGTCGTC TGGACAGAAT 
TCTTATTACT GGCTCGAAAA TGCCTCTGCC 
CGATTCTCAA TTAAGCCCTA CTGTTGAGCG 
CGCATATGAT ACTAAACAGG CTTTTTCTAG 
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CGGTAATGAT 3420 
TTGGTTTAAT 3480 
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TCC66TGTTT 
AATTTAGGTC 
TGTCTTGC6A 
GAGGTTAAAA 
CA6CGTCTTA 



ATTCTTATTT 
AGAA6ATGAA 
TTGGATTT6C 
AG6TAGTCTC 
ATCTAAGCTA 



AGCGACGATT TACAGAAGCA 
ATTAAAAAAG GTAATTCAAA 
TGTTTCATCA TCTTCTTTTG 
TGTAACTTG6 TATTCAAAGC 
TACTGTTACT GTATATTCAT 



AACGCCTTAT TTATCACACG 
GCTTACTAAA ATATATTTGA 
ATCAGCATTT ACATATA6TT 
TCAGACCTAT GATTTTGATA 
TCGCTAT6TT TTCAAG6ATT 
AGGTTATTCA CTCACATATA 



TGAAATTGTT 
CTCAGGTAAT 
AATCAGGCGA 
CTGACGTTAA 



TGTTTTACGT GCTAATAATT TTGATAT6GT 
TAATCCAAAC AATCAGGATT ATATTGATGA 
TGATAATTCC GCTCCTTCTG GTGGTTTCTT 
TTTTAAAATT AATAACGTTC GGGCAAAGGA 
GTCTAATACT TCTAAATCCT CAAATGTATT 
TAGTGCACCT AAAGATATTT TAGATAACCT 
AACTGACCAG ATATTGATTG AGGGTTTGAT 
TCATTT GCTGCTGGCT 



AAATGTAATT 
TGAAATGAAT 
ATCCGTTATT 
ACCTGAAAAT 
TGGTTCAATT 
ATTGCCATCA 
TGTTCCGCAA 
TTTAATAC6A 
ATCTATTGAC 
TCCTCAATTC 



CAAACCATTA 
ACGCGTTCTT 
ACCTAA6CCG 
T6ACTCTTCT 
ATTAATTAAT 
TACTGTTTCC 
TCTT6AT6TT 



TT 

CCTCACCTCT GTTTTATCTT 
AGGGCTATCA GTTCGCGCAT 
TATTCTTACG CTTTCAGGTC 
TACTGGTCGT GT6ACTGGTG 
TCAAAATGTA GGTATTTCCA 



TCTGGATATT 
TACTAATCAA 
C6GTG6CCTC 
AATCCCTTTA 
ATACGTGCTC 



ACCAGCAAGG 
AGAAGTATTG 
ACTGATTATA 



CTCAGCGTGG 
CT6CTGGTGG 
TAAAGACTAA 
AGAAGGGTTC 

AATCTGCCAA TGTAAATAAT 
TGAGC6TTTT TCCTGTTGCA 
CCGATAGTTT 6AGTTCTTCT 



GTCGGTATTT 
AAAAGTTTTC 
ATATAACCCA 
AATTCACTAT 
CTAAGGGAAA 
TTGATTTAT6 
AATTTTGTTT 

AATTCGCCTC TGCGCGATTT 
GTTTCTCCCG ATGTAAAAGG 
CTACGCAATT TCTTTATTTC 
CCTTCCATAA TTCAGAAGTA 
TCTGATAATC AGGAATATGA 
AATGATAATG TTACTCAAAC 
GTTGTCGAAT TGTTTGTAAA 
GGCTCTAATC TATTAGTTGT 
CTTTCTACTG TTGATTTGCC 
ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 
CACTGTTGCA GGCGGTGTTA ATACTGACCG 
TTCGTTCGGT ATTTTTAATG GCGATGTTTT 
TAGCCATTCA AAAATATTGT CTGT6CCACG 
TATCTCTGTT GGCCAGAATG TCCCTTTTAT 
CCATTTCAGA CGATTGAGCG 
ATGGCTGGCG GTAATATTGT 
ACTCAGGCAA GTGATGTTAT 
TAATTTGCGT GATGGACAGA 



TCAAGATTCT GGCGTACCGT 
CCGCTCTGAT TCCAACGAGG 
CGCCCTGTAG CGGCGCATTA 
CACTTGCCAG CGCCCTAGCG 



CGCCCTGATA GACGGTTTTT 
TCTTGTTCCA AACTGGAACA 
GGATTTTGCC GATTTCGGAA 
CGTGGACCGC TTGCTGCAAC 
CGTCTC6CTG 6TGAAAAGAA 



ACGACA6GTT TCCCGACTGG 
TCACTCATTA GGCACCCCAG 
nGTGAGCGG ATAACAATTT 



CTACAACGGT 
AAAACACTTC 
ATCGGCCTCC TGTTTAGCTC 
GTCAAAGCAA CCATAGTACG 
GT6TG6TGGT TACGCGCAGC GTGACCGCTA 

TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA 
GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACG6CA CCTC6ACCCC 
ATTTG66T6A TG6TTCACGT AGTGGGCCAT 
CGTTGGAGTC CACGTTCTTT AATAGTG6AC 
CTATCTCGGG CTATTCTTTT GATTTATAAG 
ACA6GATTTT CGCCTGCTGG GGCAAACCAG 
CCAGGC6GTG AA6GGCAATC AGCTGTTGCC 

GGCGCCCAAT AC6CAAACCG CCTCTCCCCG CGCGTTGGCC 6ATTCATTAA 

AAAGCGGGCA GTGAGCGCAA CGCAATTAAT 
GCTTTACACT TTATGCTTCC 

CACACGCGTC ACTTGGCACT 

GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTACAT 

AAGCACTATT 6CACTGGCAC TCTTACCGTT ACTGTTTACC CCTGTGGCAA AAGCCTATGG 

5361 6GGGTTCATG CTTCTGAGGC ATCCGGGAGC TGAAGGCGAT 6ACCCTGCTA AGGCTGCATT 
6421 CAATAGTTTA CAGGCAAGTG CTACTGA6TA CATTGGCTAC GCTTGGGCTA TGGTAGTAGT 
6481 TATAGTTGGT GCTACCATAG G6ATTAAATT 
6541 AGCAATAGCG AAGAGGCCCG CACCGATCGC 
5601 GAATG6CGCT TTGCCTGGTT TCCGGCACCA 
6651 GATCTTCCTG AGGCCGATAC GGTCGTCGTC 
5721 GCGCCCATCT ACACCAACGT AACCTATCCC ATTACGGTCA 
6781 GAGAATCCGA CGGGTTGTTA CTCGCTCACA TTTAATGTTG 
6841 GGCCA6ACGC GAATTATTTT TGATGGC6TT CCTATTGGTT 
5901 AAAAATTTAA CGCGAATTTT AACAAAATAT TAACGTTTAC 
6961 CAATCTTCCT GTTTTTGGGG CTTTTCTGAT TATCAACCGG 
7021 TAGTTTTACG ATTACCGTTC ATCGATTCTC TTGTTTGCTC 
7081 TGATAGCCTT TGTAGATCTC TCAAAAATAG CTACCCTCTC 
7141 GAACGGTTGA ATATCATATT GATGGTGATT TGACTGTCTC 
7201 AATCTTTACC TACACATTAC TCAGGCATTG CATTTAAAAT 
7261 TTTATCCTTG CGTTGAAATA AAGGCTTCTC CCGCAAAAGT ATTACAGGGT 
7321 TTGGTACAAC CGATTTAGCT TTATGCTCTG AGGCTTTATT GCTTAATTTT 
7381 T6CCTTGCCT GTATGATTTA TTGGACGTT 
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CTCTTTTACT 
TCCTGTCTAA 
AAAGCACGTT 5460 
AGCGCGGCGG 5520 
CCCGCTCCTT 
GCTCTAAATC 
AAAAAACTTG 
C6CCCTTTGA 
ACACTCAACC 
CCACCATCAA 
TCTCTCAG6G 
AAACCACCCT 
TGCA6CTGGC 
GTGAGTTAGC 



GGCTCGTATG TTGTGTGGAA 
GGCCGTCGTT TTACAACGTC 
GGAGAAAATA AAGTGAAACA 



ATTCAAAAAG TTTACGAGCA AGGCTTCTTA 
CCTTCCCAAC AGTTGCGCAG CCTGAATGGC 
GAAGCGGTGC CGGAAAGCTG GCTGGAGTGC 



CCCTCAAACT GGCAGATGCA 
ATCCGCCGTT 
ATGAAAGCTG 
AAAAAATGAG 
AATTTAAATA 
GGTACATATG 
CAGACTCTCA 
CGGCATTAAT 
CGGCCTTTCT 
ATATGAGGGT 
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CGGTTACGAT 5720 
TGTTCCCACG 6780 
GCTACAGGAA 6840 
CTGATTTAAC 6900 
TTTGCTTATA 6950 
ATTGACATGC 7020 
GGCAATGACC 7080 
TTATCAGCTA 7140 
CACCCTTTTG 7200 
TCTAAAAATT 7250 
CATAATGTTT 7320 
GCTAATTCTT 7380 
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110 I 20 1 30 I 40 \ 50 1 50 
1 AATGCTACTA CTATtAGTAG AATTGATGCC ACCTTTTCAG CTC6CGCCCC AAATGAAAAT 60 
61 ATA6CTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA AT6GTCAAAC TAAATCTACT 120 
121 C6TTCGCAGA ATTG66AATC AACT6TTACA TGGAAT6AAA CTTCCAGACA CCGTACTTTA 180 
181 6TT6CATATT TAAAACATGT TGA6CTACA6 CACCAGATTC AGCAATTAAG CTCTAA6CCA 240 
7Ul TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAA6G TACTCTCTAA TCCT6ACCTG 300 
^ni TTgSgTTTG CTTCCGGTCT GGTTCGCTTT 6AAGCTC6AA TTAAAACGCG ATATTTGAAG 350 
361 TCTTTC6GGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 
421 CA6GGIAAAG ACCTGATTTT TGATTTATGG TCATTCTC6T TTTCTGAACT GTTTAAAGCA 480 
481 TTTGAG56GG ATTCAATGAA TATTTAT6AC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 
q2i fiUrfiTTTTA CTATTACCCC CTCT6GCAAA ACTTCTTTTG CAAAAGCCTC TC6CTATTTT 600 
In] GGTTTTTATC GTCGTCTG6T AAAC6AG66T TATGATAGTG TT6CTCTTAC TATGCCTCGT 660 
661 AATTCCTTTT GGC6TTAT6T ATCTGCATTA GTTGAATGT6 GTATTCCTAA ATCTCAACTG 720 
791 ATfiAATCTTT CTACCT6TAA TAATGTT6TT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC 6TCCTGACT6 6TATAATGA6 CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
841 CAATGATTAA A6TT6AAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
qoi CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTT6AT TTGGGTAATG 960 
961 AATATCC6GT TCTT6TCAAG ATTACTCTTG ATGAAGGTCA 6CCA6CCTAT GCGCCTGGTC 1020 
1091 TGTACACCGT TCATCTGTCC TCTTTCAAA6 TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 6TCTGC6CCT CGTTCCGGCT AA6TAACATG GAGCA66TCG CG6ATTTC6A CACAATTTAT 140 
1141 CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTC6C6C TT6GTATAAT CGCTGG6GGT 1200 
19m fAAAGATGAG T6TTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1250 
1961 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT 6TA6CCGTTG CTACCCTCGT TCCGAT6CTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 C6ATCCCGCA AAAGCG6CCT TTAACTCCCT GCAAGCCTCA 6C6ACC6AAT ATATC6GTTA 440 
1441 TGC6T666CG ATGGTT6TTG TCATTGTCGG C6CAACTATC G6TATCAAGC TGTTTAAGAA T^nn 
1501 ATTCACCTCG AAAGCAAGCT 6ATAAACCGA TACAATTAAA 6GCTCCTTTT GGAGCGTTTT 
1551 TTTTTG6AGA HTTCAACGT 6AAAAAATTA TTATTCGCAA TTCCTTTA6T TGTTCCTTTC 
1521 TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTA6CAA AACCCCATAC AGAAAATTCA 1680 
1581 TTTACTAACG TCTG6AAAGA CGACAAAACT TTA6ATCGTT AC6CTAACTA T6A666TT6T 1740 
1741 CT6TGGAATG CTACAGGCGT TGTAGTTTGT ACT66TGAC6 AAACTCA6T6 TTACGGTACA 1800 
1801 TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGA66GTG GTG6CTCTGA G6GTGGC6GT 1860 
1851 TCT6A66GTG GCGGTTCTGA 6G6TG6C66T ACTAAACCTC CT6A6TACGG TGATACACCT 1920 

1921 ATTCCGG6CT ATACTTATAT ..CAACCCICTX GACGGCACTT- ATCCGCaGG-TAGTO^ 

rqRT~AArrrrGCTA ATCCTAATCC TTCTCTT6A6 GA6TCTCAGC CTCTTAATAC TTTCATGTTT 2040 
2041 CA6AATAATA GGTTCC6AAA TA6GCAGG66 GCATTAACTG TTTATACGGG CACTGTTACT 2100 
9ini TAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2150 
2 6 TATGAC6CTT ACT6GAACG6 TAAATTCA6A 6ACT6CGCTT TCCATTCTGG CTTTAATGAA 2220 
9991 RaxrrflTTrG TTTRTGAATA TCAAGGCCAA TC6TCTGACC TGCCTCAACC TCCTGTCAAT 2280 
9281 GCTG6C6GCG GCTCT6GT66 TG6TTCTGGT GGC6GCTCT6 AGG6TG6TGG CTCTGAGGGT 2340 
2341 GGCG6TTCT6 A66GTGGCG6 CTCT6A6GGA GGCGGTTCCG GTGDT6GCTC TGGTTCCGGT 2400 
2401 GATTTT6ATT ATGAAAAGAT GGCAAAC6CT AATAA6GG6G CTATGACCGA AAATGCC6AT 2460 
2451 6AAAACGCGC TACA6TCTGA CGCTAAA66C AAACTTGATT CT6TC6CTAC T6ATTAC6GT 2520 
9591 6CTGCTATC6 ATGGmCAT TGGT6ACGTT TCC66CCTTG CTAATGGTAA T6GTGCTACT 2580 
2581 G6T6ATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTC6 GT6AC6GT6A TAATTCACCT 2640 
9641 TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTC6CCCT 2700 
2701 TTT6TCTTTA 6C6CTGGTAA ACCATATGAA TTTTCTATTG ATT6TGACAA AATAAACTTA 2750 
2751 TTCC6TGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTAT6TAT6T ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTT6 66TATTCCGT 2880 
9881 TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 
2941 TTAAAAAGG6 CTTC66TAAG ATA6CTATTG CTATTTCATT 6TTTCTTGCT CTTATTATTG 3000 
3001 GGCTTAACTC AATTCTT6TG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT |060 
3051 TTGTTCA666 TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCT6TTTT TATGTTATTC 3120 
3121 TCTCTGTAAA 6GCTGCTATT TTCATTTTT6 ACGTTAAACA AAAAATC6TT TCTTATTT6G 3180 
^181 ATTGG6ATAA ATAATATGGC T6TTTATTTT GTAACTGGCA AATTAG6CTC TGGAAAGAC6 32^0 
3241 CTC6TTA6C6 TTG6TAAGAT TCAGGATAAA ATTGTAGCTG G6T6CAAAAT A6CAACTAAT 3300 
3301 CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTC6G6AG6T TC6CTAAAAC GCCTC6C6TT 3360 
rTTaRAATAr fGGATAAGCC TTCTATATCT GATTTGCTTG CTATT6GGCG CGGTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 
3481 ACCCGTTCTT G6AATGATAA 6GAAAGACA6 CC6ATTATTG ATT6GTTTCT ACAT6CTCGT 3540 
3541 AAATTAG6AT G6GATATTAT CTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGC6 3600 
^Rm rr,TTrTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3560 
3651 TTT6TCGGTA CTTTATATTC TCTTATTACT G6CTC6AAAA JGCCTCTGCC TAAATTACAT 3720 
3721 GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAAGA ATTTGTATAA C6CATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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38U1 TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 
3901 AATTTA6GTC AGAA6ATGAA 6CTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3950 
3961 T6TCTT6CGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 
HQ21 GAGGHAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 
4081 CAGC6TCTTA ATCTAAGCTA TCGCTAT6TT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 
mm AGCGACGftTT TACAGAA6CA AGGTTATTCA CTCACATATA TTGATTTAT6 TACTGTTTCC 4200 
4201 ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGTAATTA ATTTTGTTTT CTT6ATGTTT 4260 
4251 GTTTCATCAT CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTCGCCTCT GCGCGATTTT 4320 
4321 GTAACTTGGT ATTCAAAGCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 4380 
4381 ACTGTTACTG TATATTCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATTTCT 4440 
44i»l GTTTTAC6TG CTAATAATTT TGATATGGTT CCTTCAATTC CTTCCATTAT TTAGAAGTAT 4500 
4501 AATCCAAACA ATCAGGATTA TATTGATGAA TT6CCATCAT CTGATAATCA G6AATATGAT 4560 
4561 GATAATTCCG CTCCTTCTGG TGGTTTCTTT GTTCCGCAAA ATGATAATGT TACTCAAACT 4520 
4621 TTTAAAATTA ATAACGTTCG GGCAAAGGAT TTAATACGA6 TTGTCGAATT GTTTGTAAAG 4580 
4681 TCTAATACTT CTAAATCCTC AAATGTATTA TCTATTGACG GCTCTAATCT ATTAGTTGTT 4740 
mm AGTGCACCTA AAGATATTTT AGATAACCTT CCTCAATTCC 7TTCTACTGT TGATTTGCCA 4800 
4801 ACTGACCAGA TATTGATTGA GGGTTTGATA TTTGAGGTTC AGCAAGGTGA TGCTTTAGAT 4860 
4861 TTTTCATrTG CTGCTGGCTC TCAGCGTGGC ACT6TTGCA6 GCGGTGTTAA TACTGACCGC 4920 
4921 CTCACCTCTG TTTTATCTTC TGCTGGTGGT TCGTTCGGTA TTTTTAATGG CGATGTTTTA 4980 
4981 GGGCTATCAG TTCGCGCATT AAAGACTAAT AGCCATTCAA AAATATTGTC TGTGCCACGT 5040 
SOm ATTCTTACGC TTTCAGGTCA GAAGGGTTCT ATCTCTGTTG GCCAGAATGT CCCTTTTATT 5100 
5101 ACTGGTCGTG TGACTGGTGA ATCTGCCAAT GTAAATAATC CATTTCAGAC GATTGAGCGT 5160 
5161 CAAAATGTAG GTATTTCCAT GAGCGTTTTT CCTGTTGCAA TGGCTGGCGG TAATATTGTT 5220 
5221 CTGGATATTA CCAGGAAGGC CGATAGTTTG AGTTCTTCTA CTCAGGCAAG TGATGTTATT 5280 
5281 ACTAATCAAA GAAGTATTGC TACAACGGTT AATTTGCGTG ATGGACAGAC TCTTTTACTC 5340 
5341 GGTGGCCTCA CTGATTATAA AAACACTTCT CAAGATTCTG GCGTACCGTT CCTGTCTAAA 5400 
5401 ATCCCTTTAA TCGGCCTCCT GTTTAGCTCC CGCTCTGATT CCAACGAGGA AAGCACGTTA 5450 
5461 TACGTGCTCG TCAAAGCAAC CATAGTACGC GCCCTGTAGC GGCGCATTAA GCGCGGCGGG 5520 
5521 TGTGGTGGTT ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCCTTT 5580 
5581 CGCTTTCTTC CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG 5640 
5541 GGGGCTCCCT TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA AAAAACTTGA 5700 
5701 TTTGGGTGAT GGTTCACGTA GTGGGCCATC GCCCTGATAG ACGGTTTTTC GCCCTTTGAC 5750 
5761 GTTGGAGTCC ACGTTCTTTA ATAGTGGACT CTTGTTCCAA ACTGGAACAA CACTCAACCC 5820 
5821 TATCTC6GGC TATTCTTTTG ATTTATAAGG GATTTTGCC6 ATTTCGGAAC CACCATCAAA 5880 
5881 CAGGATTTTC GCCTGCTGGG GCAAACCAGC GTGGACCGCT TGCTGCAACT CTCTCAGGGC 5940 
5941 CAGGCGGTGA AGGGCAATCA GCTGTTGCCC GTCTCGCTG6 TGAAAAGAAA AACCACCCTG 5000 
6001 GCGCCCAATA C6CAAACCGC CTCTCCCCGC GCGTTGGCCG ATTCATTAAT GCAGCTGGCA 5050 
6061 CGACAGGTTT CCCGACTGGA AAGCGGGCAG TGAGCGCAAC 6CAATTAATG TGAGTTAGCT 6120 
6121 CACTCATTAG GCACCCCAGG CTTTACACTT TATGCTTCCG GCTCGTATGT TGTGTGGAAT 5180 
6181 TGTGAGC6GA TAACAATTTC ACACAG6AAA CAGCTAT6AC CAGGATGTAC GAATTCGCA6 5240 
6241 GTAGGA6A6C TCGGCGGATC CGAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT 5300 
6301 AGTTTACAGG CAAGTGCTAC TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA 5350 
6351 GTTGGTGCTA CCATAGGGAT TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAACCA 5420 
6421 GCTGGC6TAA TAGCGAAGAG GCCCGCACCG ATCGCCCTTC CCAACAGTTG CGCAGCCTGA 6480 
6481 ATGGCGAAT6 GCGCmGCC TGGTTTCCGG CACCAGAAGC GGTGCCGGAA AGCTGGCTGG 5540 
6541 AGTGCGATCT TCCTGAGGCC GATAGGGTCG TCGTCCCCTC AAACTGGCAG ATGCACGGTT 6600 
6601 ACGATGCGCC CATCTACACC AACGTAACCT ATCCCATTAC GGTCAATCCG CCGTTTGTTC 5660 
5651 CCAC6GAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA AGCTGGCTAC 5720 
5721 AGGAAG6CCA GACGCGAATT ATTTTTGATG GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 5780 
6781 TTAACAAAAA niAACGCGA ATTTTAACAA AATATTAACG TTTACAATTT AAATATTTGC 5840 
5841 TTATACAATC TTCCTGTTTT TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 5900 
5901 CATGCTAGH TTACGATTAC CGTTCATCGA TTCTCTTGTT T6CTCCAGAC TCTCAGGCAA 6960 
6961 TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACC CTCTCC6GCA TTAATTTATC 7020 
7021 AGCTAGAACG GTT6AATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC 7080 
7081 TTTTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 7140 
7141 AAATTTTTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 7200 
7201 TGTTTTTGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 7260 
7251 nCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT 7294 
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1 AATGCTACTA CTATTA6TAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 50 
61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGG6AATC AACT6TTACA T6GAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GHGCATATT TAAAACATGT TGA6CTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 
241 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCT6ACCTG 300 
501 TTG6A6TTTG CTTCCGGTCT GGTTCGCTTT GAAGCTC6AA TTAAAACGCG ATATTT6AAG 360 
351 TCTHCSGGC TTCCTCTTAA TCTTTTT6AT GCAATCCGCT TTCGTTCTGA CTATAATA6T 420 
421 CAGG6TAAA6 ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 
481 TTTGAG5GGG ATTCAATGAA TATTTAT6AC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 
541 AAACATTTTA CTATTACCCC CTCT6GCAAA ACTTCTTTTG CAAAAGCCTC TC6CTATTTT 500 
501 GGITTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 
651 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 AT6AATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
841 CAATGAHAA A6TTGAAATT AAACCATCTC AAGCCCAATT TACTACTC6T TCTG6T6TTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 
961 AATATCC66T TCTTGTCAA6 ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCT6GTC 1020 
1021 T6TACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCT6CGCCT CGTTCCGGCT AAGTAACATG 6AGCAGGTCG CGGATTTCGA CACAATTTAT 1140 
1141 CAGGCGATGA TACAAATCTC CGTT6TACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGTTTTAGT6 TATTCTTTCG CCTCTTTCGT TTTA6GTTGG TGCCTTCGTA 1250 
1251 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTT6 CTACCCTCGT TCCGAT6CTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACC6AAT ATATCG6TTA 1440 
1441 TGC6TG6GCG ATGGTTGTTG TCATTGTC6G CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1550 
1551 TTTTTG6AGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 
1621 TATTCTCACT CC6CT6AAAC T6TTGAAA6T TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 
1681 TTTACTAACG TCTGGAAAGA GCACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 1740 
1741 CT6T6GAATG CTACAG6CGT TGTAGTTT6T ACTGGT6AC6 AAACTCAGT6 TTACGGTACA 1800 
1801 TGGGnCCTA nGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 1860 
1861 TCTGAG6GTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG T6ATACAGCT -1920 - 
-1921 ATTCCG6GCT -ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTT6AG GAGTCTCAGC CTCHAATAC TTTCAT6TTT 2040 
2041 CAGAATAATA GGTTCC6AAA TAG6CAGGGG GCATTAACTG TTTATAC6GG CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2151 TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTG6TGG CTCTGAGGGT 2340 
2341 GGCGGHCTG A66GTG6CGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 2400 
2401 GATTTT6ATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTAT6ACCGA AAATGCCGAT 2460 
2451 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CT6TCGCTAC TGATTACGGT 2520 
2521 GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAATG6TAA TG6TGCTACT 2580 
2581 GGTGAHTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 
2641 TTAATGAATA ATTTCC6TCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
2701 TTTGTCTTTA GCGCTGGTAA ACCATATGAA TTTTCTATTG AHGTGACAA AATAAACTTA 2750 
2761 TTCCGTGGT6 TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTAT6TATGT ATTTTCTAC6 2820 
2821 TTTGCTAACA TACTGC6TAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG G6TATTCCGT 2880 
2881 TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 2940 
2941 TTAAAAA6GG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATT6 3000 
3001 GGCTTAACTC AATTCTTGTG 66TTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 
3051 TTGTTCA6GG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGAC6 3240 
3241 CTCGTTA6CG TTG6TAAGAT TTAGGATAAA ATTGTA6CTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 3360 
3361 CTTAGAATAC CGGATAAGCC TTCTATATCT 6ATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCHGCTT GTTCTC6ATG AGTGCGGTAC TTGGTTTAAT 3480 
3481 ACCCGTTCTT GGAAT6ATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3540 
3541 AAATTAGGAT 6GGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG 3600 
3601 CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3560 
3661 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GTTG6CGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CT6TTGA6C6 TTGGCTTTAT 3780 
3781 ACTG6TAAGA. ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 
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3841 TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 
3901 AAnTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3950 
3951 TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAA6CC6 4020 
4021 6A6GTTAAAA AG6TAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 
4081 CAGCGTCHA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 
4141 AGCGACGAH TACA6AAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 
4201 ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4250 
4251 TfiTTTCATCA TCTTCTTTTG CTCAGGTAAT T6AAATGAAT AATTCGCCTC T6CGCGATTT 4320 
4321 TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 
4381 TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 4440 
4441 TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAA6TA 4500 
4501 TAATCCAAAC AATCA6GATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 
4551 TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 
4621 TTTTAAAATT AATAACGTTC G6GCAAA66A TTTAATACGA GTT6TC6AAT TGTTTGTAAA 4680 
4681 GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 
4741 TAGT6CACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 
4801 AACTG ACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CA6CAA6GTG ATGCTTTAGA 4860 
4851 TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTT6CA GGCGGTGTTA ATACTGACCG 4920 
4921 CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGITH 4980 
4981 AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 
5041 TATTCTTACG CTTTCAGGTC A6AAGG6TTC TATCTCT6TT GGCCA6AATG TCCCTTTTAT 5100 
5101 TACTGGTCGT GTGACT6GTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5150 
5151 TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 
5221 TCTGGATATT ACCAGCAAG6 CCGATA6TTT GAGTTCTTCT ACTCAGGCAA 6TGATGTTAT 5280 
5281 TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 5340 
5341 CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 
^!*9} AATCCCTTTA ATCGGCCTCC T6TTTAGCTC CCGCTCT6AT TCCAACGA6G AAAGCACGH 5460 
5^61 ATAC6TGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGC66CGG 5520 
5521 GTGTGGT66T TACGCGCAGC GTGACCGCTA CACTTGCCAG C6CCCTAGCG CCCGCTCCTT 5580 
5581 TC6CTTTCTT CCCTTCCTTT CTCGCCAC6T TCGCCGGCTT TCCCC6TCAA GCTCTAAATC 5540 
5641 GGGGGCTCCC TTTAGGGTTC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 5700 
5701 ATTT6GGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 5760 
5751 C6TTGGAGTC CAC6TTCTTT AATAGTGGAC TCnGTTCCA AACTGGAACA ACACTCAACC 5820 
5821 CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC GATTTC66AA CCACCATCAA 5880 
5881 ACAGGATTTT CGCCT6CTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCA6GG 5940 
5941 CCAG6C6GTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 6000 
6001 GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGC6TTGGCC GATTCATTAA TGCAGCTGGC 6060 
6061 ACGACA6GTT TCCCGACTGG AAAGCGGGCA GTGAGC6CAA CGCAATTAAT GTGAGTTAGC 6120 
6121 TCACTCATTA GGCACCCCAG GCniACACT TTATGCTTCC GGCTCGTATG TT6TGTGGAA 6180 
6181 TTGT6AGC66 ATAACAATTT CACACGC6TC ACTTGGCACT GGCCGTCGTT TTACAACGTC 6240 
6241 GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTACAT GGAGAAAATA AAGTGAAACA 6300 
6301 AAGCACTATT GCACTGGCAC TCTTACCGTT ACTGTTTACC CCTGTGGCAA AAGCCCTTCT 6360 
6361 GA6GCATCC6 GGAGCTGAAG GCGATGACCC TGCTAAGGCT GCATTCAATA GTTTACAG6C 6420 
5421 AAGTGCTACT GAGTACATTG GCTACGCTTG GGCTATGGTA GTAGTTATAG TTGGTGCTAC 6480 
6481 CATA6G6ATT AAATTATTCA AAAAGTTTAC GA6CAAGGCT TCTTAAGCAA TAGC6AAGAG 6540 
6541 GCCC6CACCG ATC6CCCTTC CCAACAGTTG CGCAGCCTGA ATG6CGAATG GCGCTTTGCC 6600 
6501 TGGTTTCCGG CACCAGAAGC GGTGCCGGAA AGCTGGCTGG AGTGCGATCT TCCTGAGGCC 6560 
6561 GATACGGTCG TCGTCCCCTC AAACTGGCAG AT6CACGGTT ACGATGCGCC CATCTACACC 6720 
6721 AAC6TAACCT ATCCCATTAC G6TCAATCCG CCGTTTGTTC CCACGGAGAA TCCGACG6GT 6780 
6781 TGTTACTC6C TCACATTTAA TGTT6ATGAA AGCTG6CTAC AGGAAGGCCA GACGCGAATT 6840 
6841 ATrmGATG GC6TTCCTAT TGGTTAAAAA ATGAGCTGAT TTAACAAAAA TTTAACGCGA 6900 
6901 ATTTTAACAA AATATTAACG TTTACAATTT AAATATTTGC TTATACAATC TTCCTGTTTT 6960 
6961 TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA CATGCTAGTT TTACGATTAC 7020 
7021 CGHCATCGA TTCTCTTGH TGCTCCAGAC TCTCAGGCAA TGACCTGATA GCCTTTGTAG 7080 
7081 ATCTCTCAAA AATAGCTACC CTCTCCGGCA TTAATTTATC AGCTAGAACG GTTGAATATC 7140 
71^1 ATATTGATGG TGATTTGACT GTCTCCG6CC TTTCTCACCC TTTTGAATCT TTACCTACAC 7200 
7201 ATTACTCA66 CATTGCATH AAAATATATG AGGGTTCTAA AAATTTTTAT CCTTGCGHG 7260 
7261 AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA TGTTTTTGGT ACAACCGATT 7320 
7321 TAGCTTTAT6 CTCTGA6GCT TTATTGCTTA AHTTGCTAA TTCTTTGCCT TGCCTGTATG 7380 
7381 ATTTATTGGA CGTT 7394 
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